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Foreword 


FDA Perspective on Modeling and Simulation and the Need 
for Good Simulation Practice 


The U.S. Food and Drug Administration is responsible for protecting the public health by 
ensuring the safety and efficacy of medical products marketed in the United States. Within 
the FDA, medical products for human use are regulated by three Centers: the Center for 
Devices and Radiological Health (CDRH) regulates medical devices and in vitro diagnos- 
tics, the Center for Drug Evaluation and Research (CDER) oversees drugs, and the Center 
for Biologics Evaluation and Research (CBER) is responsible for biologic products. 

Over the past decade, the FDA has recognized the immense potential of modeling and 
simulation (M&S) as a valuable tool to complement traditional approaches in gathering 
evidence about medical products. In the early 2010s, the FDA prioritized M&S as part 
of its regulatory science agenda. The Modeling and Simulation Working Group (Mod- 
SimWG) was established in 2016, bringing together over 200 FDA scientists who use 
or review M&S. One of the key objectives of the ModSimWG is to enhance awareness 
about the various types and applications of M&S within the FDA. Therefore, in 2022 
the ModSimWG released a public report “Successes and Opportunities in Modeling and 
Simulation for FDA".! This report describes how M&S is used within FDA, including 
an overview of which modeling fields and which modeling applications are relevant to 
each of the FDA Centers. Additionally, it showcases 14 success stories where M&S has 
been effectively utilized. The report also identifies opportunities for FDA to maximize the 
potential of M&S in the future. 

Among these opportunities, the report highlights the importance of establishing Good 
Simulation Practice to promote consistency in M&S across the FDA. While FDA, and 
particularly CDRH, has played a leading role in developing methods for evaluating the 


' Successes and Opportunities in Modeling and Simulation for FDA, https://www.fda.gov/science- 
research/about-science-research-fda/modeling-simulation-fda. 
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reliability of M&S in regulatory submissions,^? there remains a need for further stan- 
dardization in all aspects of M&S workflows. The implementation of Good Simulation 
Practice is seen as a key factor in ensuring the sustained success of M&S in the healthcare 
field. 

As part of an active collaboration between ModSimWG and the Avicenna Alliance, 
a team of 13 FDA M&S experts covering all three medical product Centers (CDRH, 
CDER, and CBER) provided feedback on each chapter of this book. These FDA scientists 
supplement the dozens of subject matter experts across academia and industry who have 
reviewed the material throughout its development. The feedback provided by the FDA 
experts represents personal opinions only and should not be construed as an actual or 
implied endorsement of the content by the FDA. However, it is hoped that this book 
will stimulate further conversations amongst all stakeholders and ultimately contribute to 
the development of community-accepted Good Simulation Practice, bringing M&S for 
medical products to maturity. 


Pras Pathmanathan 

Office of Science and Engineering 
Laboratories 

Center for Devices and Radiological 
Health 

U.S. Food and Drug Administration 
Silver Spring, Maryland, USA 


? ASME VV-40:2018- Assessing Credibility of Computational Modeling through Verification and 
Validation: Application to Medical Devices. 


5 Guidance Document, Assessing the Credibility of Computational Modeling and Simulation in 
Medical Device Submissions, FDA, 2023. 
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Introduction 1 


Marco Viceconti, Liesbet Geris, Luca Emili, Axel Loewe, 
Bernard Staumont, Enrique Morales-Orcajo, Marc Horner, 
Martha De Cunha Maluf-Burgman, and Raphaélle Lesage 


1.1 Scope of this Document 


The term GxP indicates a collection of good practices, e.g., quality guidelines, to ensure 
that a product is safe and meets its intended use. The most important examples of GxP 
in biomedicine are Good Laboratory Practice (GLP) and Good Clinical Practice (GCP). 
The GLP are curated by the Organisation for Economic Co-operation and Development 
(OECD); they provide “a managerial quality control system covering the organisational 
process and the conditions under which non-clinical health and environmental studies 
are planned, performed, monitored, recorded, reported, and retained (or archived)”. The 
International Council for Harmonisation of Technical Requirements for Pharmaceuticals 
for Human Use (ICH) curates the GCP. GCP provides an international ethical and sci- 
entific quality standard for clinical trials to facilitate the regulatory authorities’ mutual 
acceptance of clinical evidence in the various ICH regions. GxP guidelines are available 


I https://www.oecd.org/chemicalsafety/testing/overview-of-good-laboratory-practice.htm. 


M. Viceconti (Dl) 
Alma Mater Studiorum— University of Bologna, Bologna, Italy 
e-mail: marco.viceconti Q unibo.it 


L. Geris 
University of Liége, KU Leuven & VPH Institute, Leuven, Belgium 
e-mail: director? vph-institute.org 


L. Emili 
InSilicoTrials Technologies, Trieste, Italy 
e-mail: luca.emili @insilicotrials.com 


A. Loewe 
Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany 
e-mail: axel.loewe kit.edu 


© The Author(s) 2024 1 
M. Viceconti and L. Emili (eds.), Toward Good Simulation Practice, Synthesis Lectures 
on Biomedical Engineering, https://doi.org/10.1007/978-3-031-48284-7 1 


2 M. Viceconti et al. 


for different industrial sectors, including foods, medical products, medical devices, and 
cosmetics. In some cases, the GxP simply expresses best practices within an industrial 
sector; in others, they are elevated to quasi-regulatory standards, which must be met to 
achieve specific regulatory approval. 

The use of Computational Modelling and Simulation (CM&S) in clinical medicine is 
usually referred to as In Silico Medicine. The term was first used in PubMed in 2013 
and has become popular since then. The academic research community loosely uses the 
term In silico Trials to indicate the use of CM&S to assess the safety and/or efficacy 
of new healthcare products, be they medical devices, medicinal products, or others. The 
term appeared in PubMed in 2002 (Ashelford et al., 2002). One of the issues with this 
term is that it uses the term "trial" loosely, whereas in the regulatory domain, the term 
is used in a much more specific way. To avoid such confusion, going forward, we will 
use the term In silico Trials only in a colloquial way. Instead, we will use the term In 
Silico Methodology to indicate any use of CM&S as, at any level, a regulatory decision 
support tool on new medical products for which a marketing authorisation is requested, 
whether medical devices, medicinal products, or others. 

This position report on Good Simulation Practice (GSP) does not emerge from a vac- 
uum. For example, since 2002, at least 2196 of 565 original premarket approval (PMA) 
applications for medical devices included computational modelling efforts provided in 
the Summary of Safety and Effectiveness Data (SSED) (Morrison et al., 2019). Thus, our 
community of practice, in general, and major regulatory agencies, in particular, have been 


reflecting on using predictive models as a development and de-risking tool for medical 
products. In some cases, such reflections took the form of guidance documents or techni- 
cal standards for specific uses. Annexe 1 reviews the existing regulatory guidance on the 
topic. 
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While the regulatory community is actively engaged in developing a comprehensive 
regulatory framework that includes the use of computational simulations to support a 
medical decision with the introduction of the concept of "Software as a Medical Device" 
(SaMD), a similar level of engagement has been so far absent for the broader application 
of CM&S in regulatory decision-making processes. There is only one detailed resource 
for guiding the validation of In silico methodologies applied to medical devices: the 
American Society of Mechanical Engineers (ASME) Verification & Validation (V &V)- 
40 standard,” originally published in 2018, whose original scope was limited to medical 
devices (hereinafter referred to as VV-40:2018). 

While the VV-40:2018 standard is a valuable resource, the authors of the present docu- 
ment believe there is a need for a document that summarises the good practices in using In 
silico methodologies to support the regulatory process for all kinds of medical products. 
Such a document could play a role similar to that of the Good Clinical Practice (GCP), the 
Good Laboratory Practice (GLP), or the Good Manufacturing Practice (GMP) guidelines. 
Thus, by analogy, it could be named “Good Modelling & Simulation Practice for medical 
products", and hopefully, it may be curated and/or adopted by the members of the Interna- 
tional Medical Device Regulators Forum (IMDRF). A GxP may either remain a voluntary 
guideline or be elevated to a standard by standardisation bodies such as the International 
Council for Harmonisation (ICH) or the International Organization for Standardization 
(ISO). The compilation of Good Modelling & Simulation Practice for medical products 
is a challenging task. In silico methodologies have started to be adopted only recently, 
and the experience is limited. Also, the expertise required to write such a document is 
extremely multidisciplinary. 

The VPH Institute? and the Avicenna Alliance^ are two international not-for-profit 
organisations that represent all practitioners in the field of In silico Medicine: the first 
represents the academic community, and the second the industrial community. The EU- 
funded In Silico World project? operates an online forum, in collaboration with the VPH 
Institute and the Avicenna Alliance, called the In Silico World Community of Prac- 
tice ISW_CoP).° The over 500 experts participating in this ISW_CoP share a common 
professional or educational interest for In silico Medicine. Within this community, a con- 
sensus emerged on the opportunity to collaboratively compile a position report aimed to 
summarise the current thinking within the ISW CoP on the good practices for In silico 
methodologies, so as to provide a basis for the future development of a formal standard 
on the Good Modelling & Simulation Practice for medical products. 


2 https://www.asme.org/codes-standards/find-codes-standards/v- V-40-assessing-credibility-comput 
ational-modeling-verification-validation-application-medical-devices. 

d https://www.vph-institute.org/. 

* https://avicenna-alliance.com/. 

z https://insilico.world/. 

s https://insilico.world/community/. 
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Thus, the scope of this document is to provide a list of the best practices for using 
computer simulation to assess the safety and efficacy of medical products, which emerged 
through a consensus process within our ISW CoP. The form we chose is a “Position 
Report"—a public document providing an expert opinion to orient policies or standards. 
In this sense, the present document is not binding and represents only the consensus 
among some field experts. However, we hope this document might provide a starting 
point for a future standardisation effort by an appropriate body. And while CM&S is 
used throughout the entire life cycle of medical products, including discovery and design, 
verification, development, optimisation, re-design, etc., this position report focuses only 
on their use to assess the safety and efficacy of medical products. 

The first output that the ISW CoP produced was a systematic analysis of all possi- 
ble Contexts of Use (CoU) for In Silico Methodologies (Viceconti et al., 2021a). CoUs 
concisely describe how the new methodology will be used in the medical products’ 
development and regulatory assessment process. 

We used the taxonomy presented in Table 1.1 to organise the list of potential CoUs. 
The safety and efficacy of medical products are usually investigated using experimen- 
tal methodologies: in vitro and ex vivo experiments, in vivo animal experimentation, or 
in vivo human experimentation. In silico Methodologies are a valid alternative to these 
experimental methodologies. Using terminology that was first used to categorise alter- 
natives to animal experimentation, In Silico methodologies can be used to reduce the 
experiment (fewer bench tests, fewer animals enrolled, fewer patients enrolled), refine 
the experiment (reduce the suffering of animals, reduce risks for humans, improve the 
ability of pre-clinical studies to predict the clinical outcome, generalise the experimental 
finding, etc.), and replace the experiment (replace the experiment entirely). This produces 
a 3 x 3 taxonomy (Table 1.1), which will be used in the remainder of this document. 


1.2 The Critical Elements of a Good Simulation Practice Standard 


The critical elements that any reflection on the GSP should address are the theoretical 
foundations, the development and credibility assessment of the models, the possible reg- 
ulatory and Health Technology Assessment (HTA) pathways, the ethical review process 
when In silico methodologies are involved, and the role of the sponsors and the investi- 
gators. These aspects will be addressed in the following chapters; here below, we provide 
a brief summary. 


1.2.1 Theoretical Foundations of Good Simulation Practice 


Regulatory science focuses on problems very close to clinical application. Thus, in 
general, its practitioners are not interested in the more fundamental aspects treated by 


1 Introduction 


Table 1.1 Taxonomy of In silico methodologies 


Preclinical In vitro/Ex 
vivo experiments 


Reduce 


Reduce the number or 
duration of in vitro/ 
ex vivo experiments 


Refine 


Improve the predictive 
accuracy of safety 
and/or effectiveness 
provided by the 

in vitro or ex vivo 
experiment 


Replace 


Replace a portion or 
all the required in vitro 
or ex vivo experiments 


Preclinical animal 
experiments 


Reduce the number of 
animals involved in the 
experiment, or its 
duration (adoption of 


Alleviate the suffering 
of the animals 
involved, or improve 
the predictive accuracy 


Replace animal 
experiments used for 
the prediction of the 
expected safety and/or 


sustainability of the safety and/or effectiveness of a new 
principles) effectiveness provided | treatment during 
by the animal clinical 
experiment (solving or | experimentation 
acknowledging animal 
protection issues) 
Clinical human Support the design of | Reduce the risks for Replace human 


experiments 


clinical experiments. 
Reduce the number of 
clinical studies, their 
duration, or the 
number of subjects 
involved. 

Solving scarcity on 
patients population 
related to rare diseases 
and where patients are 
children 


the humans involved 
or improve the 
predictive accuracy of 
the safety and/or 
effectiveness provided 
by the human trials 


experiments used for 
the prediction of the 
expected safety and/or 
effectiveness of a new 
treatment 


mathematics, philosophy of science, and epistemology (study of human knowledge). 
However, the extreme interdisciplinarity involved with computational modelling and sim- 
ulation in the development and de-risking of medical products makes it difficult for every 
single group of experts to use the epistemological guidelines accepted and established in 
the practice of their discipline. Having solid theoretical foundations helps in these cases to 
find common ground across different disciplines and epistemologies. The goal of Chap. 2 
is to provide such foundations. 
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1.2.2 Model Development 


A computer model is, first and foremost, a software artefact; as such, it must be developed 
and tested using the quality assurance principles in software engineering. While this is a 
relatively mature topic for regulatory science, which has been specialised for biomedical 
applications with the introduction of the so-called Software as a Medical Device category 
of medical devices, there are some specificities of providing quality assurance for software 
with predictive purposes that require specific treatments in a future GSP standard. In 
Chap. 3, we analyse this topic in full detail. 


1.2.3 Model Credibility 


Even if a model has been developed with the highest possible quality standard, this does 
not guarantee that the predictions this model provides can be trusted per se. The problem 
of assessing the credibility of a model's prediction is a problem that has been addressed 
in the regulatory science of high-risk products such as nuclear power plants or passenger 
aircrafts. Yet, in the biomedical domain, this is a very recent topic. 

Annexe 1 provides an overview of all regulatory documents that address this problem. 
Still, even the most recent efforts, such as the ASME VV-40:2018, leave an ample por- 
tion of the territory untouched. V V-40:2018 targets the development of medical devices, 
leaving out drug development and the development of Advanced Therapeutic Medici- 
nal Products (ATMPs). The classic VVUQ approach the V V-40:2018 refers to is robustly 
defined for purely mechanistic models, i.e., models built exclusively from widely accepted 
theories; however, many predictors are now built using data-driven methodologies, where 
no theory is involved. Furthermore, in practice, most models are called grey-box models 
because they are built by combining mechanistic and empirical knowledge. In Chap. 4, 
we provide a systematic discussion of the topic. 


1.2.4 Possible Regulatory Pathways 


The regulatory assessment of In silico methodologies does not fit well with the traditional 
separation between drugs and medical devices. It must include elements of technical val- 
idation more common in the regulatory pathways of medical devices, but also elements 
of clinical validation more common in the regulatory pathways of medicinal products. In 
Chap. 5, we explore the issue of which regulatory pathway is most suitable to qualify In 
silico methodologies to be used in the regulatory assessment of new medical products. 
We describe four possible pathways and discuss their pros and cons. 
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1.2.5 Possible Health Technology Assessment Pathways 


In silico methodologies can play an essential role in the marketing authorisation of new 
medical products, their cost-benefit assessment, the definition of prescriptive appropriate- 
ness, and post-marketing surveillance. In Chap. 6, all these aspects are considered and 
discussed with concrete examples. 


1.2.6 Ethical Review of In Silico Methodologies 


Before it starts, every experimental study on humans must be reviewed by an indepen- 
dent organisation known in Europe as Independent Ethics Committee and in the USA as 
Institutional Review Board. Chap. 7 explores if and how such a review process needs to 
change when In silico methodologies are involved. 


1.2.7 The Role of the Sponsor in In Silico Methodologies 


The sponsor is “an individual, company, institution, or organisation which takes responsi- 
bility for initiating, managing, and/or financing a clinical trial". The sponsor plays a vital 
role in conventional trials, codified in detail in various standards and guidelines, such as 
the Good Clinical Practice. Chap. 8 explores how such a role needs to be extended when 
In silico methodologies are involved. 


1.2.8 The Role of the Investigator in In Silico Methodologies 


The investigator is another role that needs to be partially redefined when the clinical 
evaluation of a new medical product involves In silico methodologies. In a clinical study, 
the Investigator is the person involved in running the study. The Investigator may help 
prepare and carry out the study, monitor the study safety, collect and analyse the data, and 
report study results. When In silico methodologies are involved, the Investigator is also 
responsible for carrying out the modelling tasks and generating the In silico evidence. 
Chap. 9 explores how these additional responsibilities change the Investigator's profile 
and role. 


7 https://toolkit.ncats.nih.gov/glossary/clinical-study-sponsor/. 
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1.3 Essential Good Simulation Practice Recommendations 


— In silico methodologies can be categorised depending on how they are used as an alter- 
native to experimental methodologies: to refine, reduce, and replace in vitro, animal, 
or human experimentation. 
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Theoretical Foundations of Good Simulation 2 
Practice 


Marco Viceconti, Miguel Juarez, Axel Loewe, Daniela Calvetti, 
Erkki Somersalo, Liesbet Geris, Marc Horner, 
and Martha De Cunha Maluf-Burgman 


2.1 Introduction 


This position report aims to support future standardisation efforts on Good Simulation 
Practice. Good practice standards are usually a summary of best practices, collected 
empirically and consolidated through a consensus process among practitioners. As such, 
they are the least theoretical artefact that can be expected in regulatory science. Thus, 
it might require some explanation on why we decided to add a chapter on some of the 
theoretical foundations supporting the concepts in the following chapters. 

As already mentioned, regulatory best practices emerge through consensus among 
practitioners. This implies that such practitioners are culturally relatively homogenous 
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and share the same vocabulary. Even more important, they share a common epistemology, 
the principle around which humans establish new knowledge, in this case, knowledge on 
the safety and efficacy of new medical products. This is one of the reasons why the reg- 
ulatory assessment of medicinal products and medical devices remain separated, despite 
the more frequent combination products; each class of products has its own vocabulary, 
expertise, and epistemology. 

Nevertheless, there are also commonalities. For example, the whole regulatory science 
is formulated as purely empirical, where experimental evidence and real-world observa- 
tions are considered the only source of reliable information. Introducing modelling and 
simulation in the regulatory process raises several epistemological challenges. The pri- 
mary one is that CM&S evidence is predicted, not observed. Such predictions can be 
based on well-accepted theories that have resisted extensive falsifiability efforts, theories 
that might still be debated, and even purely phenomenological observations on a large 
volume of observational data. It is quite clear that a predictive model and a controlled 
experiment are different ways to investigate physical reality, but how they differ is debat- 
able. Even more complex is the definition of a formal process to establish the truth content 
of a model’s prediction (what we call here "credibility"). 

Last but not least, the introduction of computational modelling and simulation must 
add to the panels of experts that develop by consensus the good simulation practice totally 
new expertise such as applied mathematics, computer science, software engineering, and 
a whole territory of engineering science sometimes referred to as Modelling and Sim- 
ulation in Engineering. But this creates a group of experts with different backgrounds, 
terminologies, and even epistemologies. This is why the discussion around the regula- 
tory acceptance of in silico methodologies is so complex; the involved experts struggle to 
communicate and collaborate effectively. 

There is no easy solution to this problem. People with different expertise and back- 
grounds will have to try to talk to each other and try to understand the other points of 
view. But in such a complex debate, we believe it is essential to have some theoretical 
foundations to which we can resort. Thus, contrary to all other chapters, this does not 
directly contribute to the regulatory science debate on the GSP. As such, it might not be 
of particular utility to the regulators, although it may serve as an indirect nexus between 
the regulatory and the CM&S sciences. However, we believe it is a necessary element of 
such a document that might prove useful in some complex discussions that the consensus 
process will inevitably impose. 
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2.2 What is a Model in Science? 


"A model is an invention, not a discovery" (Massoud et al., 1998). The Stanford Ency- 
clopaedia of Philosophy devotes an entire chapter to the non-trivial question in the heading 
(Frigg & Hartmann, 2020). For the purposes of this chapter, a useful definition is: “Models 
are finalised cognitive constructs of finite complexity that idealise an infinitely complex por- 
tion of reality through idealisations that contribute to the achievement of a knowledge on that 
portion of reality that is reliable, verifiable, objective, and shareable” (Viceconti, 2011). 
Models are a way that we humans think about the world. In science, models idealise a 
quantum of reality: 


— To memorise and logically manipulate quanta of reality (Descriptive models) 

— To combine our beliefs on different quanta of reality in a coherent and non- 
contradictory way toward the progressive construction of a shared vision of the world 
(Integrative models) 

— To establish causal and quantitative relationships between quanta of reality (Predictive 
models) 


Predictive models are used in science primarily for two purposes: 


— as tools used in the development and testing of new theories 
— as tools for problem-solving 


In this second use purpose, we define the credibility of a predictive model as its ability 
to predict causal and quantitative relationships between quantities in the natural phe- 
nomenon being modelled, as measured experimentally. Thus, the first foundational aspect 
of a model's credibility is the complex relationships that predictive models have with 
controlled experiments. 


2.3 A Short Reflection on the Theoretical Limits of Models 
and Experiments 


Nature is infinitely complex and its mere observation, while useful to formulate explana- 
tory hypotheses of why a certain phenomenon occurs, is not sufficient to test whether such 
hypotheses are true. To attempt the falsifiability of an explanatory hypothesis, we need 
real-world observations or a controlled experiment, or experiment for short. In an experi- 
ment, we intentionally perturb the system under investigation and observe how it responds 
to this perturbation. By controlling some of the variables that describe the system's state 
and observing how other state variables change, we can reject all hypotheses that are 
inconsistent with the results; the hypothesis that resists all our falsifiability attempts is 
tentatively assumed to be true. 
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Controlled experiments are extremely challenging in life sciences because of the com- 
plexity and entanglement of living organisms. The most realistic experiment is the one 
where we merely observe the system, but even in that case, because of the observer 
effect, by the simple act of observing the system, we perturb it; then, human beings 
cannot achieve a hundred percent (10096) realism. As soon as we perturb the system of 
interest, what we observe is not the system per se but an experimental model of that sys- 
tem. In other words, even an observational study is a model of reality. As soon as we 
investigate reality with a model (which we believe is always the case), the key question 
is the "Degree of Analogy" between the model and the reality being modelled: How close 
does the model capture the functional aspects of the reality that we are trying to under- 
stand? It might look completely different, but if it works like the portion of reality under 
investigation, it is a good model. 

A major advantage of experimental models is that their Degree of Analogy with the 
reality they model can be inferred from how they were built. Every experimental model 
contains a fraction of physical reality. The bigger this fraction, the higher the Degree of 
Analogy of the experimental model. 

Too frequently in medicine, we confuse analogy with homology: biological systems 
are homologous if they have evolved from the same origin or from a common ancestor, 
regardless of their function. As such, we consider mice as experimental models of humans 
because both are terrestrial vertebrates with common ancestors. But a mouse might be 
farther from a human than a fruit fly, for a specific physiological function. 

However, there is unquestionably a relationship between analogy and homology. The 
closer our experimental model is to the reality we want to investigate, the more likely the 
model will have a strong analogy with such reality. Therefore, even if it is done because 
of homology and not of analogy, in general, a randomised clinical trial of a new drug is 
in general more analogous to the reality of the use of that drug in clinical practice than 
an animal study on the efficacy of that drug, which in turn is more analogous than an 
in vitro experiment in cell culture. This might not always be the case, but it frequently is. 

Thus, we can infer the Degree of Analogy of an experimental model has with the por- 
tion of reality we are investigating by analysing how the experiment was built. The more 
controlled the experiment, the heavier perturbation we make to the physical reality and 
the lower the degree of analogy. Thus, the experimental models trade off their controlla- 
bility with their degree of analogy, which can be inferred from how the experiment was 
built. 

It should be noted here that the controllability of an experiment in the context of life 
science is not only limited by the trade-off with the Degree of Analogy. Living organisms 
are very complex and highly entangled, which means that perturbing one specific aspect 
will often impact other aspects, sometimes in fairly unpredictable ways. To this, we need 
to add all the ethical limits of animal and human experimentation. Sometimes the optimal 
experimental design is not possible for ethical reasons. 
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There is another way to build models of reality. As introduced above, models can be 
defined as “finalised cognitive constructs of finite complexity that idealise an infinitely com- 
plex portion of reality through idealisations that contribute to the achievement of knowledge 
on that portion of reality that is objective, shareable, reliable and verifiable” (Viceconti, 
2011). If we accept this definition, then models can be built not only by perturbing/ 
manipulating the physical reality we want to investigate (experimental models) but also 
by any other type of idealisation process. Here, we are interested in “in silico” models 
built through computational modelling and simulation of specific idealisation processes. 

The idealisation processes we use to build in silico models can differ greatly. For 
example, statistical inference models are built through inductive reasoning framed in a 
frequentist or Bayesian theory of probability; biophysical mechanistic models are built by 
deductive reasoning starting from tentative knowledge that has resisted extensive attempts 
of falsifiability (laws of physics). While these differences will become vital in other 
chapters, here it will suffice to recognise that in silico models are built through some 
idealisation process. 

We notice two significant differences when comparing in silico and experimental mod- 
els. The first is that the Degree of Analogy an in silico model has with the reality under 
investigation cannot be inferred by how the model was developed. Since there is no 
grounding with the physical reality typical of experimental models, the degree of analogy 
must be demonstrated for each in silico model. 

This is a major shortcoming of in silico models, which would almost always make 
us prefer experimental models if not for another important difference: the controllability 
of in silico models is entirely independent of the Degree of Analogy. This means that 
we could, in principle, consider the use of in silico models to reduce, refine, and replace 
experimental models when it is possible to demonstrate their Degree of Analogy with 
the reality being modelled and when that Degree of Analogy is higher than that offered 
by experimental models with similar levels of control. The second motivation for using 
in silico models to reduce, refine and replace experimental models is when for the same 
Degree of Analogy and the same level of controllability, in silico models can provide the 
required answer faster and/or at a lower cost. A third motivation comes from the observa- 
tion that even for experimental studies within the currently accepted ethical boundaries, 
every animal and human experiment has an ethical cost that should be minimised as much 
as possible. 

We can infer the Degree of Analogy of experimental models simply by how they are 
built; all we need to do is to quantify their validity and reliability. On the contrary, with 
in silico models, we must demonstrate that an in silico model has the necessary Degree 
of Analogy for each Context of Use before we can use it to reduce, refine, or replace 
experimental models. 


14 M. Viceconti et al. 


2.4 Model for Hypothesis Testing, Models for Problem-Solving 


In the previous section, we introduced experimental models as a necessity of the scientific 
method, which requires that each hypothesis born out of the observation of a natural phe- 
nomenon is relentlessly challenged with controlled experiments designed to falsify this 
hypothesis. This is the classic use of models in fundamental science when the goal is 
to increase our knowledge of the world around us. But there is another use for models, 
whether experimental or in silico: problem-solving. In his famous book “All Life is Prob- 
lem Solving" (Popper, 1994), Karl Popper insists on using tentative scientific knowledge 
to solve problems affecting human life, including healthcare. 

All our reflections in this position report are related to the use of models for 
problem-solving, and in particular, to a specific class of problem: determining, before 
its widespread use, if a new medical product is sufficiently safe and effective to justify its 
marketing authorisation. 

While in knowledge discovery, the focus is on the falsifiability of hypotheses, in 
problem-solving, we assume that the knowledge used to build our predictive models (if 
any) is tentatively true. However, this does not automatically imply that the model pre- 
dictions will be accurate; several factors may introduce errors in the prediction, which 
we will detail in the next section. Therefore, it is necessary to systematically assess the 
Degree of Analogy before a predictive model is used in a mission-critical context (e.g., a 
predictive model of a medical device or medicine that is intended to save a patient's life). 

Another related dichotomy frequently used to separate statistical models from machine 
learning models is between inference and prediction. Inference aims to generalise for an 
entire population the properties observed in a sample of such a population. The purpose 
of inference models is representational in nature. Their predictions aim to forecast unob- 
served data, such as future behaviour (e.g., in the business context, predictive modelling 
uses known results to create, process, and validate a model that may be used to forecast 
future outcomes in a specific context of use). On the other hand, the purpose of predictive 
models is predictive in nature. While inference is backed by a robust mathematical theory 
(probability theory) and, in particular, by the Law of Large Numbers, which has resisted 
extensive falsifiability attempts, this theory does not necessarily apply to data-driven pre- 
dictive models, which makes the evaluation of the Degree of Analogy for data-driven 
models epistemologically challenging. 


2.5 Assessing the Degree of Analogy of a Model: Evidence 
by Induction 


The predictive accuracy of a model can be estimated by comparing its predictions to 
the results of a matching controlled experiment. Matching here means that the model 
should be informed with a set of inputs that quantify the independent variables of the 
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controlled experiment, i.e., the quantities we control in the experiment. By doing so, we 
assume that the model is associated with that specific experiment. Thus, the predictive 
accuracy (for that particular set of inputs) is the degree of agreement between the values 
of the dependent variables measured in the controlled experiment and the same values 
as predicted by the model. This activity is usually called experimental validation of a 
predictive model. It should be noted that for classic validation studies, it is expected that 
the errors affecting the measurement methods used in the experiment to be negligible 
when compared to those affecting the model's prediction; this allows the assumption that 
the measured value is "true" and the difference between prediction and measurement is 
due to the errors affecting the model. When this is not the case, comparing the model to 
the experiment becomes much more complex. 

There is a major issue with this approach in that it is inductive in nature. By validating 
the model with one experiment, we estimate its predictive accuracy for those input values. 
This only allows us to say that the model has a certain accuracy when used to predict a 
specific condition described by those input values. A priori, nothing can be said about the 
model accuracy for other input values. Of course, another validation experiment can be 
performed, followed by calculating the model's predictive accuracy for a second inputs 
set. Still, again, this will extend our validity statements only to this second condition. We 
can do many validation experiments and try to build by induction a general validity for 
our model, or we can look at the nature of the predictive error the model being tested 
exhibits and find patterns and regularities. 

The analysis of how the prediction error is composed is more commonly used in the 
validation of mechanistic, knowledge-based predictive models. In contrast, the validation 
by induction is typical for data-driven predictive models. The separation of the predictive 
error in its numerical, epistemic, and aleatoric components is the central motivation for 
the so-called Verification, Validation, and Uncertainty Quantification (^V VUQ) Viceconti 
et al., (2020b). 


2.6 The Theoretical Framing of VVUQ 


VVUQ developed within engineering sciences as an empirical practice without clear the- 
oretical foundations. This may sound surprising, but historically also the most important 
numerical methods in engineering, like finite element analysis, were first developed as 
empirical methods and only later found a theoretical framing as a special case of the 
Galerkin method. Like all practices, the meaning of VVUQ may vary among practition- 
ers. Also, VVUQ is frequently used in engineering science without many questions like 
why such a process should inform us better about the credibility of knowledge-based 
predictive models than any other approach. 
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However, for the purpose of this chapter, it is important to make explicit the theoretical 
framing that supports the use of VVUQ. This is because, as we will see, this approach 
relies on several assumptions, which might not always be true when the evaluated model 
predicts complex living processes. Here, we provide a summary; full details can be found 
in Viceconti et al. (2020b). 

There are three possible sources of predictive error in a knowledge-based model: 


— The numerical error we commit by solving the model's equations numerically; 

— the epistemic error that we commit due to our incomplete, idealised, or partially 
fallacious knowledge of the phenomenon being modelled; 

— the aleatoric error due to the propagation of the measurement errors that affect all our 
model inputs. 


If we compare a model's prediction to the result of a controlled experiment, we will 
Observe a difference caused by all these errors. The VVUQ process aims to separate 
these three components of the predictive error. If the model is solved appropriately, then 
we expect the numerical error to be negligible compared to the other two. We expect the 
aleatoric error to be comparable to the measurement errors that affect the model's inputs. 
If this is not the case, the model might have mathematical or numerical instabilities. 
In other words, we want to be reassured that the epistemic error is the predominant 
component of the predictive error. 

Verification activities aim to quantify the numerical error. At the risk of oversimplify- 
ing, verification tests the model with special input values where epistemic and aleatoric 
errors are exactly null or asymptotically convergent to null. While the verification is per- 
formed for these special input values, because it is generally true that numerical errors 
are independent of or only weakly dependent on the inputs, we assume that the numerical 
errors found with those special input values will remain roughly the same for any other 
input value. 

Uncertainty quantification explores how the experimental errors affecting the model 
inputs propagate within the model and affect the predicted values. The input values are 
perturbed according to the probability distribution of the experimental error affecting 
them, and the variance induced in the predicted outputs is recorded. Uncertainty quantifi- 
cation directly estimates the aleatoric error for a specific set of input values. It is usually 
assumed that how the error due to the inputs’ uncertainties propagates into the model's 
predictions is independent of the specific values of the inputs. In other words, it is usu- 
ally tested how the variance of the inputs around a single set of average input values 
propagates. 

Validation activities rely on two assumptions. First, the numerical errors are negligible 
compared to the other two sources of error. Second, the aleatoric error is normally dis- 
tributed around a null mean. If this second assumption is true, the effect of the numerical 
errors will be negligible when we calculate the predictive error as an average (e.g., root 
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mean square average) over multiple experiments. In this case, the aleatoric errors will also 
cancel out, leaving the average predictive error as a good estimate of the epistemic error. 

The last step in the VVUQ process is the so-called applicability analysis. While we 
tend to assume that numerical and aleatoric errors do not depend on the specific values of 
the inputs, such an assumption cannot generally be made for the epistemic error. On the 
contrary, it is expected that any idealisation holds within a limit of validity, and as we get 
closer to those limits, the epistemic error will increase. There are various approaches to 
evaluating the applicability of a model. Still, most rely on one fundamental assumption: 
if two input sets are similar, the two output sets will also be similar. Suppose the model 
tends to show similar epistemic errors for all tested inputs. In that case, we can consider 
that the epistemic error will also be similar for all other input values within the range of 
values tested during the validation. 

Additionally, the reliability of the epistemic error estimate obtained during the valida- 
tion activities decreases the further the model inputs drift outside the range of values tested 
in the validation. Another issue to consider, as mentioned above, is that every mechanistic 
model relies on theories, and every theory has limits of validity. The model’s predictive 
accuracy can degrade considerably as the inputs reach these limits. 

Additionally, the further the model is used in terms of inputs from the range of values 
tested in the validation, the lower the reliability of the estimate of epistemic error we 
obtained with the validation activities. Another issue to consider, as mentioned above, 
is that every mechanistic model relies on theories, and every theory has some limits of 
validity. The model’s predictive accuracy can degrade considerably once the inputs get 
closer to such limits. 


2.7 Levels of Credibility Testing 


The combination of VVUQ and applicability analysis extends the concept of the model's 
credibility to combinations of input values that have not been experimentally validated. 
However, the issue of assessing if a predictive model is credible enough for a specific 
context of use has two additional aspects: the level of credibility at which we test the 
model and the minimum predictive accuracy below which we must reject the use of the 
model. The level of credibility testing is not an attribute of the model; instead, it is the 
expectation of the model's predictive accuracy, which we define by choosing against what 
we calculate the model's predictive accuracy. Three possible levels are shown in Fig. 2.1, 
for the specific case of a model that predicts a single output set (as opposed to models 
that predict entire distributions of possible values): 
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Fig.2.1 Definition of the levels of credibility for a predictive model 


— At the lowest level of credibility testing (L1), models aim to predict a value within the 
range of values observed experimentally over a population. Here, the predictive accu- 
racy is measured in terms of the probability that the predicted value for each Quantity 
of Interest (‘QolI’) is a member of the population of values measured experimentally. 

— The second level of credibility testing (L2) expects the model to accurately predict 
some central properties (e.g., the average) of the distribution of values observed exper- 
imentally over the population. Here, the predictive accuracy is quantified by measuring 
the distance for each QoI between the predicted value and the average of the values 
measured experimentally. 

— Lastly, the highest level of credibility testing (L3) expects the model to accurately 
predict the observed value for each member of the population. Here, the predictive 
accuracy is calculated as a p-norm of the vector of differences between the predicted 
value and the measured value for each member of the population. A 2-norm is com- 
monly used (root mean square error). A more restrictive infinity-norm may also be 
used, where the measure of the error is the maximum error found among all members 
of the population, may also be used. 


While this taxonomy of the level of credibility testing is not considered in any current 
regulatory document, we recommend it be considered in future guidelines and standards. 
2.8 The Conundrum of Validating Data-Driven Models 

Model credibility frameworks based on VVUQ plus applicability were developed having 


in mind models built starting from a causal explanation of the phenomenon of interest 
(mechanistic models). By considering epistemic errors, VVUQ-based credibility accepts 
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that the prior knowledge we used to build the model might be inaccurate, but it is always 
present. And in most cases, such knowledge is expressed with mathematical forms whose 
properties summarise such knowledge. For example, all theories expressed with differen- 
tial equations implicitly assume that the variation of the quantities of interest over space 
and/or time occurs smoothly. This, in turn, derives from an essential physical knowledge 
of the conservation of mass, momentum, and energy. In fact, many of the implicit assump- 
tions that the use of VVUQ to assess a model's credibility that we listed in the previous 
sections are usually valid under such assumptions. 

But this raises an important question: can credibility assessment based on VVUQ plus 
applicability be used also for models that are not built with some prior knowledge (here- 
inafter referred to as ‘data-driven models’)? The short answer is no; here, we provide 
some theoretical justifications for this conclusion. 

In probability theory, if we are sampling some population properties, the Central Limit 
Theorem (‘CLT’) tells us that such sampling will eventually converge to a normal distribu- 
tion. The Law of Large Numbers (‘LLN’) states that with enough samples, the estimates 
of certain properties of the probability distribution, such as average or variance, will 
asymptotically converge to the true values for that population. This guarantee of asymp- 
totic convergence makes it possible to infer the properties of a distribution from a large 
but finite number of samples. 

Let us now consider the use of a statistical model as a predictor. Here, using statistical 
inference, it can be shown that the hypothesis that the value of the dependent variable Y 
can be predicted given the values of a set of independent variables [X1,.., Xn] so that Y 
= f(X1, ..., Xn). Here, for simplicity of treatment, we assume that the variables X; can be 
quantified without any uncertainty. By inferring the relations and correlations between X; 
and Y, one can build an estimate of f(), (called f’()), which it can be used to predict Y for 
combinations of X; that have not yet been observed experimentally. If the LLN theorem 
holds, it is sufficient to have a finite number of observations /X;, Y;] to build f (). 

But if we now want to quantify the predictive accuracy by comparing the value pre- 
dicted Y'(X) — f'(X) to that observed experimentally (Y) for a finite number of X; sets, 
does the LLN still apply? Given a large enough set of validation experiments where it 
is observed P(Y | X;), is there a theoretical foundation to assume that the estimate of the 
average prediction error e' = ave(Y |X;) tends to the true value e it would be obtained if 
we could validate the predictor with an infinite number of experiments? Does the average 
prediction error estimate tend asymptotically to the true average prediction error? 

When estimating, we learn about the characteristics of a population by taking a sample 
and measuring those characteristics. The fact that we have a sample brings about variabil- 
ity (uncertainty), normally described by a probability distribution whose parameters are 
related to the characteristic of interest. Usually, the more information we have about the 
characteristic (the larger the sample size), the greater the accuracy (estimating the correct 
value of the characteristic) and precision (decreasing the uncertainty) of the estimation. If 
some very mild conditions apply, we can assume the variability in the estimators follows 
a normal distribution: 
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where x is the measured quantity, u is the mean, and ø is the standard deviation. 

Now we must consider an extra source of uncertainty if the objective is to make pre- 
dictions. This changes the nature of the statistical problem, indeed. The formal problem 
is to analyse P(YIX), with Y the characteristic of interest and X all the relevant data avail- 
able. Of course, Y and X are related formally by a set of parameters relevant to X and Y; 
for simplicity, formalise them as g(XIA) and g(YIB), with B = B(A) an invertible function. 
So, one can learn about A—and hence B—from X and that knowledge is described by 
f(BIX)—which can be Gaussian and with increasing precision and accuracy as above. We 
can then use this knowledge to inform g(Y1B), but there is still a source of uncertainty in 
&( YIB) that cannot be reduced further, even if B is known exactly; moreover, the shape of 
g is not warranted to be normal at all. 

There are alternative ways to include the information about B in g(YIB). But in any 
case, the law of iterated expectations can provide a formal interpretation. The expected 
value of YIX can be calculated as the expected value of YIX, B. The more you learn about 
B from X, the better the estimation of the mean of Y. Thus, larger sample sizes yield more 
accurate predictions. However, this is not necessarily the case for the precision of the 
prediction. The variance of YIX can be calculated as the expected value of the variance of 
YIX,B plus the variance of the expected value of YIX, B. The second term decreases with 
the sample size, but the first does not and depends on the distribution of YIB. Therefore, 
validating a predictive model must take both sources into account. 

Let us assume we are interested in predicting a quantity Y, which depends on a set 
of values X. f() is a predictive model that provides an estimate of Y, which we call Y’. 
The concept of model credibility assessment based on VVUQ is that the model f() is 
mechanistically defined, so we know that Y = f(X), and any other variable outside of the 
set X has little or no effect on Y (or the effect is mediated through X). 

An important implication of all this is that the smoothness of the prediction error is not 
guaranteed for data-driven models as it is for mechanistic models. In mechanistic models, 
we can assume that our error e = Y—Y' depends only on X, so if we test f() for Xy and 
Xo, where X, ^ X», the prediction error will be similar, e} ~ e2. This also means that if 
e(X1) is the prediction error for X1, and e(X2) is for Xo, e(X;) will be close to e1 and e» 
if X; is close to X and X». In other words, if the model is validated for a range of X;, it 
could safely be assumed that the error will be similar for any other X close to X;. But this 
cannot be said for data-driven models, such as Machine Learning (ML) models, because 
there is no guarantee that Y is a function only of X. We cannot, as we do with mechanistic 
models, test the ML model for a finite number of cases, and assume that average accuracy 
will not change significantly if more cases are tested. This is pure induction: by testing 
the ML model against ten experiments, it can only be said that the error with those ten 
cases is that, but the next could show an error totally different, even for an X; close to the 
ten we already tested. 
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This poses two significant problems when the VVUQ approach is used to assess the 
credibility of data-driven models. The first is that while in mechanistic models, the vari- 
ance of Y can be mainly explained with the variance of X, in data-driven models, this is 
not assured. As explained above, X may include variables that have little effect on Y. Also, 
we have no guarantee that all variables affecting Y are included in X. This uncertainty is 
the primary cause of the so-called ‘concept drift’, which sometimes causes a data-driven 
model to perform much worse than it did on the training set when the test is done against 
an independent validation set. 

The second is the lack of smoothness in the prediction error. As explained in Sect. 2.4, 
the applicability analysis presumes that the model's prediction error will vary smoothly 
as the inputs of the model are varied. This makes it possible to assume that if the model 
is used with inputs "near" to the values for which the model has been validated against 
experimental results, the error affecting the prediction will be similar to that quantified 
with the validation. However, such assumptions cannot be made for data-driven models. 

The “Artificial Intelligence and Machine Learning (AI/ML) Software as a Medical 
Device Action Plan" published by FDA on January 2021! explicitly refers to introduc- 
ing a so-called Predetermined Change Control Plan in the US regulatory system. Thus, a 
total product lifecycle (“TPLC’) regulatory approach to AI/ML-based SaMD, is designed 
considering the iterative, adaptive, and autonomous natures of AI/ML technologies. Essen- 
tially, the idea is that the validation of data-driven models is a continuous process where 
we continuously extend the test set, re-evaluate the model’s predictive accuracy, and then 
regenerate the model using this test set as an extended training set. 

Our reflections would suggest that this approach is not only possible but should be the 
only acceptable approach. In light of this discussion, the idea of a “frozen” data-driven 
model that has been validated using VVUQ procedures developed for mechanistic models, 
seems unwise. This conflicts with the obvious need for regulatory approval processes to 
base the decision on a prediction made using a “frozen” model. 


2.9 Conclusions 


Some conclusions can be drawn that can inform the rest of this position report. 

The human mind can investigate reality only through cognitive artefacts we call 
models. Whether we use a mathematical model or a controlled experiment (including 
observational studies), we always deal with models of reality; ultimately, what matters 
is the Degree of Analogy that the model has with the reality being modelled. The main 
advantage of experimental methods over in silico methods is that the Degree of Analogy 
in experimental models can be easily inferred by their design, whereas the Degree of 
Analogy for in silico methods must be assessed on a case-by-case basis. 


1 https://www.fda.gov/media/145022/download. 
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When a model is used to make predictions in the context of problem-solving, the 
Degree of Analogy with the reality being modelled becomes the credibility of the model's 
predictions. In general, credibility can be assessed only by induction, so if we quantified 
the predictive accuracy of our model against hundred (100) experimental measurements, 
then we could only state the credibility of the model under those 100 experimental con- 
ditions. The number of experimental conditions for which the predictive accuracy needs 
to be tested, called the "solution space", is infinite (co). However, under certain assump- 
tions, we can analyse how the various components of the predictive error (numerical, 
aleatoric and epistemic) vary over the solution space using a process known as VVUQ 
plus applicability analysis. This makes it possible to estimate the predictive accuracy over 
the entire solution space based on a finite number of validation experiments. 

The assumptions that make the VVUQ process possible are usually valid only if the 
tested model is built with some degree of prior knowledge (e.g., a mechanistic model). 
This is not true for data-driven models, which can only be tested by induction. 


2.10 Essential Good Simulation Practice Recommendations 


— The human mind can only understand reality through models. Models are finalised 
cognitive constructs of finite complexity that idealise an infinitely complex portion of 
reality. Their usefulness is measured by their ability to capture the functional aspects 
of interest of the portion of reality that we are investigating. This measure is called the 
Degree of Analogy. 

— [n each portion of reality, the functional aspects of interest can be observed experi- 
mentally or predicted through inductive or deductive reasoning. All these methods of 
investigation are models. However, the Degree of Analogy of experimental models can 
be directly inferred, whereas that of predictive models must be demonstrated by com- 
parisons with controlled experiments. In other words, experiments are not necessarily 
more trustworthy than predictions, but their trustworthiness is easier to assess. 

— Predictive models can be divided into predominantly data-driven models and pre- 
dominantly mechanistic models. In predominantly mechanistic models, the Degree 
of Analogy can be established by decomposing the predictive errors in numerical, 
aleatoric, and epistemic errors through a process known as Verification, Validation, and 
Uncertainty Quantification. But in predominantly data-driven models, the Degree of 
Analogy can only be estimated by induction, using a total product lifecycle regulatory 
approach. 
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3.1 A Risk-Based Paradigm of Model Development as a Function 
of Its Context of Use 


Good Simulation Practice implies that a computational model considered for a simula- 
tion task has also been developed according to good practice. In this Chapter, an attempt 
is made to summarise and synthesise good practices in computational model develop- 
ment. A high level of abstraction is needed when considering numerous different model 
types (a recent report of the US Food and Drug Administration (FDA)! mentions 39 
different modelling classes). Therefore, this Chapter focuses on model development and 
implementation as a process rather than concrete model-type specific recommendations. 
Generic model definition and design recommendations are addressed in Sect. 3.2.2. 
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Whether one develops a predictive model from scratch or from existing libraries and 
solvers, computational model development shares many commonalities with software 
development: 


(a) Models transform user inputs into outputs. 

(b) Models can be developed as standalone units or as part of larger systems/platforms. 
(c) A model’s life cycle is similar to that of software. 

(d) The concrete implementation of a predictive model is often part of a software. 


Thus, it is reasonable to explore existing standards for Software Life Cycle (SLC) man- 
agement (systems and software engineering) as a starting point for good practices in 
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model development, defined according to widely agreed-upon “best ways of doing" i? — 
relevant for application and uptake in mission-critical and highly regulated environments. 
Many different programming languages and software development paradigms exist and 
can be used for developing computational models under consideration of process-level 
software development good practice—agnostic of procedural and content-related details. 
We will therefore leverage the similarity between computational models and software to 
map good software development practices onto the former. 

Model developers must acknowledge that their ^product" may operate under a “regulat- 
ed" environment and that regulators will perform a benefit-risk assessment. Any regulatory 
effort in a mission-critical domain faces the challenge of balancing the need for the lowest 
possible level of risk for the patient and the economic viability of product development, 
without which no product could be brought to market. Risk assessment, clinical eval- 
uation, and validation revolve around the "intended purpose" defined by the Medical 
Device Regulation (MDR). There is a debate around how risk management should be 
implemented in the development of a device: following the concept of risk "As Low As 
Reasonably Practicable" (ALARP) as proposed by ISO 14971:2019 or risk reduced “As 
Far As Possible" (AFAP) as requested by the new EU regulations.? Concrete regulatory 
recommendations on transferring these concepts to computational models do not yet exist, 
except for a FDA guidance document.^ We, therefore, adopt a risk-based approach for the 
development of computational models, where the level of scrutiny (in terms of model life 
cycle management) is proportional to the assessed risk that a predictive model can pose 
(e.g., for patients) according to pre-defined Context(s) of Use (CoU). 

Thus, this Chapter focuses on applying a risk-based approach (Fig. 3.1) for the model 
and simulation software development process, where good practice establishes a minimal 
set of process-level requirements for all models (even low-risk ones). At the same time, it 
requires more comprehensive compliance with relevant industry standards for medium to 
high-risk scenarios, critical applications, and models with substantial impact on regulatory 
decision-making. It is also important to note that this chapter focuses on developing the 
modelling and simulation software platform. In contrast, Chap. 4 focuses on the result 
of this process, which is the actual implementation (i.e., the model) and its credibility 
assessment. 

In the following, we first highlight some industry standards and a potential mapping 
to elements of model development. We then iterate through the stages of a life cycle 
model relevant to model development. In each section, we adopt a viewpoint from low 
and high model risk to derive two (example) levels of alignment with the cited industry 


2 “ISO standards are internationally agreed by experts." https://www.iso.org/standards.html. 
Accessed 19 Sept. 2021. 

3 https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32017R0745. 

4 The final FDA guidance is now available (November) https://www.fda.gov/regulatory-inform 
ation/search-fda-guidance-documents/assessing-credibility-computational-modeling-and-simula 
tion-medical-device-submissions 
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Low risk: Medium risk: Medium-high risk: High risk: 


Predicting the Justifying dosing Model extrapolating Model anticipating the 
o durability of a regimen decision of a the long-term clinical optimal time point of 
a material for a confirmatory trial through effect of a drug ina defibrillation through 
E prosthetic implant rationalizing the impact pediatric population an implant 
fr under load in an of patient-covariates based on adult data 


animal model 


Awareness of industry standards 
relevant for model development 


2k Model development life cycle 
Establishing Context of Use before compliant with full range of 
starting model development relevant industry standards, 


e.g, I80:12207 or IEC 
62304:2006 


Model development plan, definition 
of requirements, model design 
document, version control for code, 
code documentation and user 
documentation, unit and integration 
testing reports for every release 


Fig.3.1 A risk-based approach for model development planning (simplified; for a complete model 
risk assessment, see ASME VV-40:2018). While a lean plan can suffice for low-risk CoUs, higher- 
risk projects must gradually consider the full range of relevant industry standards detailed in 
Sect. 3.2. Note that model risk needs to be anticipated by the developer during development. 


standards and derive good simulation practice recommendations for model development 
(Tables 3.2, 3.3, 3.4, 3.5 and 3.6). 

This Chapter’s scope is limited to model development practices and does not consider 
model use and validation aspects, which are covered in Chap. 4. 


3.2 SLC Industry Standards and Relevance for Model 
Development 


The previous section introduced a risk-based approach for model development, imple- 
menting different levels of compliance with industry standards. As no industry standard 
for computational models yet exists, the next best option is to adopt and apply standards 
and best practices from related areas similar to computational model development. Of par- 
ticular interest to the development of predictive models is the great body of process-level 
knowledge and recommendations available for software development—not only because 
of the analogy between software and model development but also because the developed 
model and the software in which it is implemented are often intertwined. 
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Two software development standards are relevant to the model development process: 
ISO/IEC/IEEE:12207? and ISO/IEC:62304.° The former applies to every software pack- 
age or system, whereas the latter is specific to medical device software. Other relevant best 
practice documents include the National Aeronautics and Space Administration (NASA) 
handbook on model development (NASA-STD-7009A" and NASA-HDBK-7009A5). A 
full mapping of all the model development activities, with processes from all poten- 
tially applicable industry standards, is out of this Chapter's scope. Instead, we intend to 
highlight the opportunities where an explicit consideration of industry standards supports 
overall quality, especially in critical applications. 


* The ISO/IEC/IEEE 12207:2017 standard “Software life cycle processes" covers dif- 
ferent process groups, including (I) Agreement, (II) Organisational project-enabling 
processes, (III) Technical management processes, and (IV) Technical processes. Pro- 
cess groups III and IV are most relevant for a given model development project in 
a given organisational structure (for definitions of a process, see, e.g., ISO/IEC/IEEE 
24774:2021). We stress that management and technical processes are orthogonal. 

— Technical management processes are concerned with managing and applying 
the resources and assets allocated by the organisation's management and apply- 
ing them. Technical management comprises (a) Project planning, (b) Project 
assessment and control process, (c) Decision management processes, (d) Risk man- 
agement processes, (e) Configuration management, (f) Information management, 
(g) Measurement, and (h) Quality assurance processes. 

— Technical processes are concerned with technical actions throughout the life cycle. 
Technical processes transform the needs of stakeholders into a product or ser- 
vice. Technical processes include (a) Mission analysis, (b) Stakeholder needs and 
requirements definition, (c) Systems/software requirements definition, (d) Architec- 
ture definition, (e) Design definition, (f) System analysis, (g) Implementation, (h) 
Integration, (i) Verification, (j) Transition, (k) Validation, (1) Operation, (m) Mainte- 
nance and (n) Disposal, of which (a-h) may be classified as “development” and are 
covered in detail in this Chapter while (i-k) are more relevant for model validation 
(see Chap. 4). Processes (l-n) need to be considered to qualify for sustained use or 
regulatory approval, where maintenance is often challenging in a research setting 
(Anzt et al., 2021). 

— ISO/IEC/IEEE 15288:2015 establishes a common framework of process descrip- 
tions for the life cycle of systems and is often used in conjunction with ISO/IEC/ 
IEEE 12207:2017. Additionally, ISO/IEC/IEEE 24748-3 describes the application 
of ISO/IEC/IEEE 12207:2017. 


5 https://www.iso.org/standard/63712.html. 

6 https://www.iso.org/standard/3842] .html. 

7 https://standards.nasa.gov/standard/NASA/NASA-STD-7009. 
8 https://standards.nasa.gov/standard/nasa/nasa-hdbk-7009. 
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* The standard ISO/IEC 62304:2006 “medical device software—software life cycle 
(SLC) processes" may be more readily adopted as it targets regulated environments in 
healthcare and is recognised by both the European Union as a Harmonised Standard 
and the United States FDA as a Recognized Consensus Standard. The structure of this 
standard is similar to ISO/IEC/IEEE 12207 (see overlap indicated in Table 3.1). In 
fact, modern development platforms that combine the ability to develop, secure, and 
operate software, can be designed to establish compliance (see here? for the case of 
ISO/IEC 62304). 


These ISO/IEC standards do not prescribe any particular life cycle model, acknowledg- 
ing that software development processes should be oriented towards the project objectives. 
Instead, they define a set of life cycle processes, which can be used to define the SLC. 
Depending on the SLC model used, the development phases may be classified as follows: 


— Analysis and Requirements: derive requirements from the mission analysis, the user 
perspective and the (integrated) system viewpoint as a function of the real-world 
system and potential application of M&S. 

— Design: collection of information and definition of concepts to include in the proposed 
model; the iterative process of creating the detailed, verifiable, and validated model 
specification and simulations for the intended use. 

— Implementation and Integration: realisation of the technical implementation of the 
model design in line with the requirements, specifications, and intended use. 

— Testing: checking to determine if the model meets all requirements and operational 
intentions. 

— Maintenance: release of the software, archiving of artefacts, life cycle management. 


Orthogonal to the technical processes, adherence to management processes can be benefi- 
cial or required. Also, adherence to industry standards rarely concerns one portion of the 
set of business processes but more likely impacts a large set of operations. Other standards 
covering quality management/assurance also apply, such as ISO 13485 (quality manage- 
ment system for designing and manufacturing medical devices) or, even more generally, 
ISO 9001. Also, consideration of a service management system specified by ISO/IEC 
20000-1, an IT asset management system specified by ISO/IEC 19770 (all parts), and an 
information security management system specified by ISO/IEC 27000 may be relevant. 


3 https://about.gitlab.com/solutions/iec-62304/. 
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Table3.1 Summary of relationships between the phases outlined in this document and most rel- 
evant phases and processes in selected industry standards and best practice documents on model 
development planning 


SLC phase ISO IEC 12207 ISO 62304 NASA-HDBK-7009A 


Mission analysis Software development planning Model initiation 


Stakeholder needs and Software requirements analysis 
requirements definition 


Systems/software requirements 
definition 


Architecture definition Software architectural design Model concept development 
Design definition Software detailed design Model design 


poderi MEGNENEN 


Implementation Software unit implementation and Model construction 
verification 


Integration Software integration and 
integration testing 


Verification Software system testing 
(developer and end user) 


Transition 


Validation 


(Developer and end user) 


Maintenance Operation Model use 
(end user) (end user) 
Maintenance Software release Model and Analysis Archiving 


(developer and end user) 


Disposal 


3.2.4 Analysis and Requirements 


The software development life cycle is initiated by a planning phase where the overall 
mission, problem, and context are analysed, and the actual requirements are defined. Sub- 
sequently, a plan defines how the new development will fulfil the mission. In many life 
cycle models, this initial phase is called Elicitation. Depending on the model, a devel- 
opment plan that is either more project-oriented or more technically oriented can be the 
better choice. Table 3.2 lists good model development practices by establishing, con- 
sidering and documenting a model development plan and the requirements definitions 
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Table 3.2 Good practices for the analysis and requirements phase of model development on the low 
and high-risk ends of the risk spectrum (see Fig. 3.1, left and right, respectively) 


RENE oe ip i 


Requirement definitions Mission requirements (CoU) Concept of Operations (ConOps) 


(design prerequisite) : document 
Risks 


i System Requirements Document (SRD) 
User requirements 


, Requirements Traceability Matrix (RTM) 
System requirements 


(if any) See IEEE/ISO/IEC 29148-2018 


Model development plan Model development plan similar to | Detailed model development plan 

(project management) a software management plan, see similar to a software development plan. 
e.g. (The Software Sustainability See ISO 62304-2006 and Rust et al. (2016) 
Institute, 2018) 


Reference Materials (including 
knowledge and data sources) 


Development and life cycle planning 


document(s) for both low and high model risk scenarios (Fig. 3.1 left and right, respec- 
tively). Generally, we regard the high-risk recommendation as the gold standard, while 
simplified processes and documentation can be acceptable for low-risk situations. 

From a model developer viewpoint, this phase in the SLC overlaps with the definition 
of a CoU in ASME V V-40:2018 (see Table 3.1). The model developer must anticipate the 
required CoUs that the model will aim for (the CoUs anticipated by the developer might 
be captured in a Concept of Operations document and use cases). CoU formulation should 
be embedded in high-risk contexts with requirements definitions aligned with industry 
standards from related disciplines until specific ones become available. 

Planning for the project is often captured in a Project Management Plan. ISO/IEC/ 
IEEE 16326 provides more detail on project planning. The project planning process aims 
to produce and coordinate effective, workable plans. This process determines the scope 
of the project management and technical activities, identifies process outputs, tasks, and 
deliverables, and establishes schedules for conducting tasks, including achievement cri- 
teria and required resources to accomplish tasks. Project planning is an ongoing process 
throughout a project with regular plan revisions. Technical planning for a software sys- 
tem is often captured in a Systems Engineering Management Plan, a Software Engineering 
Management Plan, or a Software Development Plan (SDP). ISO/IEC/TEEE 24748-5 pro- 
vides more detail on software engineering technical management planning and includes 
an annotated outline for an SDP. Notably, ISO 62304-2006 and Rust et al. (2016) suggest 
an SDP structure commensurate with regulated environments. 
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Good practice of model development should establish such a development plan 
(roughly for “low risk” or as precisely as possible for “high risk"; Table 3.2) considering 
the related guidance on software development. 

Another important stage in the initial development life cycle phase of Elicitation is 
the definition of requirements. Requirements can come from many different sources, for 
example, user needs, functionality, performance, risk, regulatory, processes, or marketing. 
As stipulated in Table 3.2 (top row), both “low-risk” and “high-risk” models should doc- 
ument the requirements for their development. Explicitly adhering to industry standards 
might not be required for low-risk models. 

The requirements definition process captures and transforms stakeholder needs into 
“well-formed requirements" (suitable as inputs for subsequent model development pro- 
cedures). A “well-formed requirement" shall possess the following attributes: necessary, 
appropriate, unambiguous, complete, singular, feasible, verifiable, correct, and conform- 
ing. IEEE/ISO/IEC 29148-2018 provides more detail on requirements engineering and 
requirement processes. The user and system requirements should be likewise defined with 
the formulated CoUs (equivalent to mission requirements). The entire set of requirements 
enables a common understanding between stakeholders and provides a reference for verifi- 
cation. They must also be validated against real-world needs and be feasible to implement 
and check (potentially formulated as part of a System Requirements Document (SRD) or 
Requirements Traceability Matrix (RTM) in high-risk contexts). This enables users to 
practically judge whether usage scenarios are within the intended CoUs versus ones that 
might be technically possible but outside of the CoUs. 


3.2.2 Design 


The key prerequisite for model design is the definition of the CoU(s) during the require- 
ments specification. In the “design specification,’ these CoU(s) must be translated into 
the architecture and component design of the actual model. As introduced in Chap. 2, the 
formulation of the model in terms of fundamental mathematical equations and parameters 
is crucial. On the one hand, the model needs to be complex enough to fulfil its CoU(s). 
On the other hand, unique parameter identifiability and parameter uncertainty (either on a 
population level during calibration or in a given subject during personalisation depending 
on the CoU) (Galappaththige et al., 2022; Parvinian et al., 2019), numerical accuracy and 
required computational effort as well as the options for verification and validation (Path- 
manathan & Gray, 2014; Pathmanathan et al., 2017) need to be considered. Following the 
law of parsimony, one should aim for the simplest model that can support the intended 
CoU. The decision-making process and limitations of this choice must be explicitly doc- 
umented (Erdemir et al., 2019). Table 3.3 lists specific aspects to be considered during 
the Design phase for low and high-risk applications. 


34 A. Kulesza et al. 


Table 3.3 Good practices for the design phase of model development on the low and high-risk ends 
of the risk spectrum 


Low Risk High Risk 


Model design Simple model design document Comprehensive model design document 
(including the elements from low risk, 
compliant with relevant design 

Definition of a conceptual model documentation standards see for example?) 


Which modelling approach is suitable for 
the CoU? 
What level of precision is required? Additionally: 


Document the limitations of the model 
Definition of the architecture design 


Additionally (if relevant) describe design 


"— í 1 decisions related to 
Definition of the architecture design 


Focus on functionality, covered performance 
hypotheses and phenomena i 
compatibility 


Description of the detailed design transferability 


Considering model-type specific usability 


recommendations adaptability 


Focus on expected sensitivity, reliability 
identifiability, 
security 


maintainability 
User interfaces 


Human-machine interfaces data privacy 


User experience User interfaces 
Describe also measures to avoid (user) 


Dialog design errors, avoid misinterpretation, to increase 


use efficiency (ergonomics) and user 
satisfaction 

Consider also Usability Engineering File (ISO 
14971, IEC 62366-1) 


Presentation of information 


More detailed operational scenarios and 

use cases More detailed operational scenarios and 
For example, as a simulation plan or use cases 

protocol (Developed together with 


Sponsor (see Chapter 10)) Document decision process according to 


standards (e.g., ISO 13485, IEC 62304) 


Model validation plan See Chapter 4 and FDA guidance^ 


4https://ntrs.nasa.gov/api/citations/2016001 1412/downloads/20160011412.pdf 
b«General Principles of Software Validation; Final Guidance for Industry and FDA Staff”, https:// 
www.fda.gov/media/73 141/download 
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Baker (2018) gave recommendations for computational simulations relevant to opera- 
tional scenarios and use cases (to be specified by the developer) as well as the simulations 
to be performed by the end user (CoUs). The initial Concept of Operations document 
(see the previous section) has to be updated to consider any limitations by the chosen 
architecture and further operational details. 

As evident from the “low-risk” scenario, comprehensive documentation for the model 
design is needed, irrespective of the risk. The minimum requirement for all models is 
thus to justify alignment of the modelling concept (and potentially data flow) with the 
anticipated CoU and to document fundamental design choices (ODE, PDE, agent-based 
model, etc., the granularity) and associated limitations. In all cases, the architecture design 
(phenomena and hypotheses, composition, input-output transformations) and the detailed 
design (geometry, equations, parameters, initial/boundary conditions) must be documented 
and justified. This potentially includes the workflow with which unknown parameters are 
estimated from target data or other workflows to produce tailored versions of the model 
(mapping onto a geometry, treatments, outputs of other simulations etc.). For many fields, 
model-type-specific recommendations exist and should be considered, e.g., for pharma- 
cological modelling (Byon et al., 2013; Cucurull-Sanchez et al., 2019; Jean et al., 2021; 
Ke et al., 2016; Overgaard et al., 2015; Zhao et al., 2012). A comprehensive list of all 
domains is beyond the scope of this Chapter. 

Considering relevant external conditions, the decision to use a specific model architec- 
ture should be based on the defined requirements (see the previous section). The essential 
properties of the architecture are often defined by the required internal and external 
interfaces to be implemented. The design should use established community standards 
regarding data formats and application programming interfaces (APIs) whenever possible. 
One should generally favour simple architectures following the established best prac- 
tices in software engineering, such as information hiding, loose coupling, high cohesion, 
separation of concerns and hierarchical decomposition. 

Design decisions are documented in the software management plan, possibly created 
during the Elicitation phase. Model design documentation should be similar across all 
model risks, even though the extent and form may differ. Good practice in defining and 
describing the architecture and system design is using diagrams and (standard) graphical 
notations, such as the Unified Modelling Language (UML)!°, to help communicate with 
stakeholders, explore potential designs, validate the software architecture, and document 
decisions. Alongside the input and output data descriptions, detailing how these data 
enter and leave the system (i.e., interfaces) is also mandatory. Other relevant elements 
of the documentation (for example, a User Manual) can only be generated during/after 
implementation (see Sect. 3.2.3). 


10 Unified Modelling Language v2.5.1 (2017) by the OMG standardisation group: https://www.omg. 
org/spec/UML/. 
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Notice that the design of the model should anticipate verification and validation activi- 
ties (either by the developer and/or even the user) to be compatible with them. Therefore, 
a validation plan should be initiated during the design phase (see Chap. 4). 

As can be seen on the right side of Table 3.3, full compliance with industry standards 
relevant for models can necessitate exhaustive documentation of potential pitfalls, errors, 
practical and life cycle aspects, and may need to follow certain forms. 


3.2.3 Implementation and Integration 


The purpose of the implementation process is to realise a specified system element. 
This process transforms requirements, architecture, and design (including interfaces) into 
actions that create a system element according to the practices of the selected implemen- 
tation technology. This process results in a system element that satisfies specified system 
requirements, architecture, and design. 

The integration process aims to synthesise a set of system elements into a functional 
system (product or service) that satisfies system/software requirements, architecture, and 
design. This process assembles the implemented system elements. Previously defined 
interfaces are activated to enable the interoperability of the system elements as intended. 

During implementation and integration (Table 3.4), established software engineering 
best practices should be applied (Anzt et al., 2021; Rust et al., 2016). Fundamental 
requirements specify using a version control system, such as git, and software docu- 
mentation in the form of the source code and a user manual. Requirements and issues 
should be tracked and linked using an adequate infrastructure. Specific versions of the 
implemented model must be assigned unique identifiers (e.g., build number and date). 
In the case of software as a medical device, persistent unique identifiers are required. 
Release software versions should be archived with relevant artefacts like documentation, 
test reports, etc. and should consider the FAIR principles for research software where 
applicable (Chue Hong et al., 2021). 


Table 3.4 Good practices for the implementation and integration phase of model development on 
the low-risk and high-risk ends of the risk spectrum 


Low Risk High Risk 


Versioned model Storage of code versions in version Version control, e.g., in line with ISO 12207, 
control systems, e.g., git. through a Configuration Management 


process for the selection of configuration 
items to be integrated. 


Documentation of all model versions 
(both in the form of the source code and 
a user manual). 
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As discussed in the next section, automated tests can help increase the efficiency of 
implementation efforts. 

The more regulated the environment and the higher the risk of the CoU, the more 
important it becomes to strictly standardise these processes. Such standards can, for 
example, include code style guidelines and conventions. 


3.2.4 Testing 


Testing serves the purpose of checking whether the model was implemented correctly. It 
answers the question “Did we build the model right?" as opposed to validation, which 
addresses "Did we build the right model?" Both aspects are covered in detail in Chap. 4, 
so we only point out a few specific issues from a model developer's perspective in 
Table 3.5. 

Various forms of testing can be applied during testing, including: 


— Regression tests, which run the model with specified input parameters and compare it 
against previously computed results. 

— Simplified test cases with analytical solutions enable an evaluation of the quality of 
the solution. 

— Performance tests can help obtain efficient code but are not the focus of the GSP. 


Regardless of the approach, testing should be automated as part of continuous integration 
pipelines. This will increase adoption and adherence by minimising developer efforts to 
maintain compliance. 


Table 3.5 Good practices for the testing phase of model development on the low-risk and high-risk 
ends of the risk spectrum 


Low risk High risk 

Tests Automatic tests during development: | Additional automatic tests throughout 
— Integration tests the life cycle: 
— System tests/regression tests — Unit tests 


— Consider static analysis 
— Maximize unit testing code coverage 


User feedback | Optional Essential 
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3.2.5 Maintenance 


The need for maintenance can arise from multiple causes other than model bugs, such 
as version updates of the model solver, functionality, material laws or any of its con- 
stituent parts, changes in external dependencies (e.g., software libraries, compilers) or 
changes in the regulatory framework, requiring re-evaluation of specific credibility factors. 
Lehman's laws of software evolution postulate that software must continuously evolve to 
remain useful (Lehman, 1980). This section focuses on the maintenance of the model 
itself (developer's perspective) rather than the maintenance of specific simulations (user's 
perspective). It assumes that a model is developed for several uses by different users. 

The model development plan should document the maintenance strategy (ISO/IEC/ 
IEEE 12207:2017). Medical device regulation (MDR) requires a post-market surveil- 
lance plan and periodic safety update reports, which provide information about good 
maintenance practices in high-risk contexts (pointing towards continuous monitoring 
activities). 

Release versions should be archived with associated data, documentation, and simu- 
lation logs for test cases (NASA-HDBK-7009!!). Version control is critical for accurate 
interpretation, repeatability, reproducibility, and debugging of the simulation predictions 
(Erdemir et al., 2020), and thus for the model’s credibility. Ideally, automated tests (see 
Sect. 3.2.4) are run for each version in continuous integration setups and a standard 
workflow for releases is defined in continuous deployment setups (NASA-STD-7009A ?). 

As a good practice, irrespective of the model risk, new release versions require an addi- 
tional verification and validation iteration by the developer to guarantee that the released 
version of the model sustains its credibility for the CoU. In high-risk contexts, continu- 
ously monitoring the model’s capability to deliver credible results, recording incidents for 
analysis, taking corrective, adaptive, perfective, and preventive actions, and confirming 
restored capability (ISO/IEC/IEEE 12207:2017) are likely indicated. 

End-of-life decisions will eventually be taken. It is important to inform the users in 
good time about the supported time frame for the model and to clearly communicate 
which support and training measures are available for users during which phases of the 
life cycle. Released versions must be archived and preserved in a format that permits 
execution beyond the supported lifetime. One solution can be software containers (e.g., 
Docker) that include all dependencies and only rely on an abstract execution layer that 
will be supported for an extended period. Also, model disposal measures prevent out-of- 
date model versions from returning to the supply chain ISO/IEC/IEEE 12207:2017 unless 
explicitly required. 


H https://standards.nasa.gov/standard/nasa/nasa-hdbk-7009. 
12 https://standards.nasa.gov/standard/nasa/nasa-std-7009. 


3 Model Development 39 


Table 3.6 Good practices for the maintenance phase of model development on the low-risk and 
high-risk ends of the risk spectrum 


Low risk High risk 

Maintenance | Execution of the maintenance strategy in | Additionally: 
the model development plan —Continuous recording of incidents, 
For each release, report the model's taking corrective, adaptive, perfective, 
capability to deliver credible results and preventive actions and confirming 
archived together with associated data, restored capability according to ISO/ 
documentation, and simulation logs for | IEC/IEEE 12207:2017 
test cases 


3.3 Conclusions 


In this Chapter, we outlined an approach to guide model development best practices based 
on a given CoU. Notice that this Chapter did not address the best practice for the end user 
of the model directly. Numerous industry standards guide on how to plan, implement, 
test, and maintain software, as part of medical devices and thus in critical, regulated 
environments. As mathematical models for healthcare often take the form of software, 
the application of an adapted industry standard from software development, for exam- 
ple, ISO/IEC/IEEE 12207:2007 or ICEI/IEC 62304:2006, seems possible. However, full 
compliance with industry standards is not always required or advisable. We, therefore, 
suggest using model risk (as defined in Chap. 4) to guide the stringency and level of 
adherence to industry standards. As a best practice, all models should comply with mini- 
mum requirements to anticipate that maximising compliance helps with model/software/ 
platform qualification/certification in regulatory processes. 

Life cycle planning reported by a model development plan is suggested as a critical 
step before implementation. Templates (e.g., SDP for medical device software in regulated 
environments (The Software Sustainability Institute, 2018)) are available. They can help 
to set up high-risk models compliant with current and future regulatory requirements. 
Requirements must be derived from a detailed analysis of the CoU, the mission and user 
need and then documented and traced throughout the development. 

Of particular importance are the documentation of the model formulation and archi- 
tecture design decisions, the design itself, and interfaces derived from the requirements 
as part of a model design document, including a description of intended use cases. 
Good software development practices should be followed during model implementation 
and integration, such as version control and the provision of tested code and end-user 
documentation. 

During development and maintenance (as defined in the development plan), integration 
and systems testing should be performed and reported systematically and automatically. 
More involved testing paradigms (e.g., unit tests) and continuous monitoring must be 
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envisaged for high-risk environments. Also, testing confirming model credibility for the 
CoU must be repeated and reported with every model release in such environments. 

This set of good model development practices provides a general, yet hopefully tan- 
gible, framework that applies to a wide range of in silico models and CoUs spanning 
different risk levels. 


3.4 Essential Good Simulation Practice Recommendations 


— Establish your model’s CoU(s), related risks and requirements in a model development 
plan before defining and implementing the model (Table 3.2). 

— Identify relevant industry standards for your model (Sect. 3.2). 

— When designing the model for your CoU(s), consider relevant domain-specific stan- 
dards, parameter identifiability and options for software verification and validation. 
Document the decision-making process for the conceptual model and the resulting 
limitations in the model design document (Table 3.3). 

— Implement the model software based on established good practices for software engi- 
neering and development (Table 3.4) and follow a test-driven development paradigm 
(Table 3.5). 


— Consider the entire model life cycle in the model management plan and secure adequate 
resources for maintenance (Table 3.6). 
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4.1 Introduction 


The lack of a framework to justify that a model has sufficient credibility to be used as a 
basis for internal or external (typically regulatory) decision-making is a primary concern 
when using modelling and simulation (M&S) in healthcare. Verification, validation, uncer- 
tainty quantification (V VUQ), and applicability assessment are processes used to establish 
the credibility of a computational model. Verification establishes that a computational 
model accurately represents the underlying mathematical model and its solution. In con- 
trast, validation establishes whether the mathematical model accurately represents the 
reality of interest. Uncertainty quantification helps to identify potential limitations in the 
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modelling, computational, or experimental processes due to inherent variability (aleatoric 
uncertainty) or lack of knowledge (epistemic uncertainty). Finally, applicability assesses 
the relevance of the validation evidence to support using the model for a specific Context 
of Use (CoU) (Pathmanathan et al., 2017). 

Various global organisations have formalised some of these concepts in guidance doc- 
uments or technical standards for specific use cases. Annexe 1 systematically reviews 
this existing body of knowledge for various computational model types, ranging from 
QSAR to ABMs to physics-based models. And given the increasing interest in in silico 
methodologies, various global standards bodies (e.g., ICH, ISO) are currently developing 
or revising guidelines and standards in this field. 
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This chapter outlines concepts related to model credibility assessment in a way that 
is agnostic to the nature of the computational model type or medical product, extracting 
the relevant concepts common to the aforementioned standards and guidance documents. 
To be as inclusive as possible, a level of granularity has been chosen that can incorporate 
most of the existing knowledge, but which may result in the grouping or omission of steps 
that do not exist in every standard or guidance. One example is the EMA’s distinction 
between technical and clinical validation. This was outlined in a letter of support to 
a request for qualification advice on the use of digital mobility outcomes (DMOs) as 
monitoring biomarkers,! where it was stated that "The technical validation will verify 
the accuracy of the device and algorithm to measure a range of different DMOs. [...] 
clinical validation will be obtained in an observational multicentre clinical trial" (see also 
(Viceconti et al., 2020) for more details). In the follow-up qualification advice, the same 
authors propose that a DMO is considered clinically validated for a well-defined CoU 
when one can demonstrate its construct validity, predictive capacity, and ability to detect 
change (Viceconti et al., 2022). Translating to in silico methodologies, both technical 
and clinical aspects of model predictions must be evaluated during validation. A clinical 
interpretation of the model validation should also be provided to support the clinical 
credibility of the predicted quantities. 

Recognizing the lack of a globally harmonised framework, this chapter will describe 
the concepts related to model credibility, supported by illustrative examples and references 
to standards, guidance, and additional documents that provide further clarification. We 
also introduce a hierarchical validation approach that distinguishes between a model's 
physiological, pathological, and treatment layers. 


4.2 Model Credibility in Existing Regulatory Guidelines 


Regulatory agencies have provided operational guidelines for assessing a predictive 
model's credibility (see Annex 1 for a complete list of all guidance and standard 
documents). For example, a 2003 FDA guideline on exposure-response relationships? 
acknowledged that “The issue of model validation is not totally resolved". It recom- 
mended separating the training set from the validation set of experimental data (implicitly 
assuming the models are all data-driven). Also, the 2018 EMA guideline on reporting 


1 EMA letter “Letter of support for Mobilise-D digital mobility outcomes as monitoring biomark- 
ers”, Apr 2020. https://www.ema.europa.eu/en/documents/other/letter-support-mobilise-D-digital- 
mobility-outcomes-monitoring-biomarkers_en.pdf. 

2 FDA guidance “Exposure-Response Relationships—Study Design, Data Analysis, and Regulatory 
Applications", Apr 2003. https://www.fda.gov/media/7 1277/download. 
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PBPK models? recommends validating models against experimental clinical studies of 
more than 100 patients. It also provides instructions on how the comparison between pre- 
dictions and experiments should be graphed. While this guideline does not explicitly refer 
to a risk-based credibility assessment, it states: "The acceptance criteria (adequacy of pre- 
diction) for the closeness of the comparison of simulated and observed data depends on 
the regulatory impact". At around the same time, a 2018 FDA guidance on PBPK mod- 
els^ requests VVUQ evidence in a generalised sense: "To allow the FDA to evaluate the 
robustness of the models, the sponsor should clearly present results from the methods 
used to verify? the model, confirm model results, and conduct sensitivity analyses." How- 
ever, this guidance also requests that electronic files related to the modelling software and 
simulations be submitted along with the PBPK study report, to “allow FDA reviewers 
to duplicate and evaluate the submitted modeling and simulation results and to conduct 
supplemental analyses when necessary". This request may overlook the complexities of 
reproducing studies involving computational models. 

The 2016 FDA guidance on Reporting of Computational Modeling Studies in Medical 
Device Submissions? outlined the importance of providing a complete and accurate sum- 
mary of computational modelling and simulation evidence that is included in a dossier. 
This guidance referenced the ASME VV-10:20197 and ASME VV-20:2009 (R2021) 
standards, but not ASME VV-40:2018? because it was not yet published. However, the 
2023 FDA guidance!’ outlines a generalised framework for assessing model credibil- 
ity that relies heavily upon the ASME VV-40:2018 standard. This guidance proposes 
eight possible categories of credibility evidence (see Table 4.1). It is important to note 


3 EMA guideline “Guideline on the Reporting of Physiologically Based Pharmacokinetic (PBPK) 
Modelling and Simulation", Dec 2018. https://www.ema.europa.eu/documents/scientific-guideline/ 
guideline-reporting-physiologically-based-pharmacokinetic-pbpk-modelling-simulation en.pdf. 
^FDA guidance “Physiologically Based Pharmacokinetic Analyses—Format and Content", Dec 
2018. https://www.fda.gov/media/101469/download. 

5 Please note that the term ‘verify’ is used in place of validate in the guidance document cited in 
Footnote. 

6 FDA guidance "Reporting of Computational Modeling Studies in Medical Device Submissions", 
Sep 2016. https://www.fda.gov/media/87586/download. 

7 ASME standard “Standard for Verification and Validation in Computational Solid Mechanics V V 
10 - 2019", 2020. https://www.asme.org/codes-standards/find-codes-standards/v- V- 10-standard-ver 
ification-validation-computational-solid-mechanics. 

8 ASME standard “Standard for Verification and Validation in Computational Fluid Dynamics and 
Heat Transfer V V 20 - 2009 (R2021)", 2009. https://www.asme.org/codes-standards/find-codes-sta 
ndards/v-V-20-standard-verification-validation-computational-fluid-dynamics-heat-transfer. 

? ASME standard "Assessing Credibility of Computational Modeling through Verification and 
Validation: Application to Medical Devices", 2018. https://www.asme.org/codes-standards/find- 
codes-standards/v-v-40-assessing-credibility-computational-modeling-verification-validation-app 
lication-medical-devices. 


10 FDA guidance "Assessing the Credibility of Computational Modeling and Simulation in Medical 
Device Submissions", Nov 2023. https://www.fda.gov/media/154985/download. 
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Table4.1 Eight categories of credibility evidence. Reprinted from FDA guidance "Assessing the 
Credibility of Computational Modeling and Simulation in Medical Device Submissions", Nov 2023 


Category | Definition Definition 


1 Code verification results | Results showing that a computational model implemented 
in software is an accurate implementation of the 
underlying mathematical model 


2 Model calibration Comparison of model results with the same data used to 
evidence calibrate model parameters 

3 Bench test validation Validation results using a bench test comparator. May be 
results supported by calculation verification and/or UQ results 


using the validation conditions 


4 In vivo validation results | Same as previous category except using in vivo data as the 
comparator 
5 Population-based Comparison of population-level data between model 
validation results predictions and a clinical data set. No individual-level 


comparisons are made 


6 Emergent model Evidence showing that the model reproduces phenomena 
behaviour that are known to occur in the system at the specified 
conditions but were not pre-specified or explicitly 
modelled by the governing equations 


7 Model plausibility Rationale supporting the choice of governing equations, 
evidence model assumptions, and/or input parameters only 

8 Calculation verification/ | Calculation verification and/or UQ results obtained using 
UQ results using CoU the CoU simulations, that is, the simulations performed to 
evidence answer the question of interest 


that categories 1, 3 and 8 are explicitly within the scope of ASME VV-40:2018, while the 
others may be considered extensions of the ASME VV-40:2018 framework. While this 
guidance acknowledges that there are different types of credibility evidence, the issue of 
different levels of credibility (as proposed in Sect. 2.7) is not considered. 


4.3 A Standard Framework: ASME VV-40:2018 


The American Society of Mechanical Engineers (ASME) Committee on Verification, Val- 
idation, and Uncertainty Quantification in Computational Modeling and Simulation has 
published the ASME VV-10:2019 and VV-20:2009 (R2021) standards, which outline the 
processes of verification, validation, and uncertainty quantification for finite element anal- 
ysis and computational fluid dynamics, respectively. These standards outline VVUQ best 
practices, but do not provide formalised procedures for steering model validation (and 
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thus the associated model development activities) towards being sufficiently credible for 
a CoU. 

This led to the formation of the ASME VVUQ 40 Subcommittee on Verification, Val- 
idation, and Uncertainty Quantification in Computational Modeling of Medical Devices. 
Through close collaboration between medical device manufacturers, regulatory agencies, 
and other device industry stakeholders, this subcommittee published the standard “Assess- 
ing Credibility of Computational Modeling and Simulation Results through Verification 
and Validation: Application to Medical Devices" in 2018. This standard introduces a risk- 
informed credibility assessment framework for physics-based models that can be applied 
to various scientific, technical, and regulatory questions. And while the standard is writ- 
ten with a focus on medical devices, the framework is general enough to be recast to 
a variety of applications, including model-informed drug development and physiologi- 
cally based pharmacokinetic modelling (Kuemmel et al., 2020; Musuamba et al., 2021; 
Viceconti et al., 2021b). 

The ASME VV-40:2018 risk-informed credibility assessment framework is shown in 
Fig. 4.1. Model credibility activities begin by stating the question of interest, which 
describes the specific question, decision, or concern being addressed (at least in part) 
by the computational model. The next step is to define the CoU, which aims to describe 
the role and scope of the model and how it is going to be used in relation to other forms 
of evidence, e.g., in vitro or in vivo data, to address the question of interest (see Viceconti 
et al. (2021a) for examples of CoUs). The overall model risk is then assessed for the CoU, 
where risk is a combination of model influence and decision consequence (see Fig. 4.2). 
Model influence is defined as the contribution of the computational model relative to other 
available evidence when answering the question of interest, and decision consequence is 
the consequence (on the patient, for the clinician, business, and/or regulator) if an incor- 
rect decision is made that is based at least partially on the model. The overall model risk 
sets the requirements for model credibility, determining the required degrees of model 
verification, validation, uncertainty quantification, and applicability such that the model 
has sufficient credibility for the CoU. 

As an example of how the CoU drives risk-informed credibility, a computational model 
used for a diagnosis that is also supported by medical imaging and clinical assessment 
would have lower model risk versus a scenario where the diagnosis relies solely on the 

Establish Risk-Informed Credibility 


Define Assess Establish Establish Execute omputationa Documentation| 
cou model risk credibility goals plan plan model credible and evidence 
for COU? 
No 


Fig.4.1 Process diagram for the risk-informed credibility assessment framework. Reprinted from 
ASME VV-40:2018, by permission of The American Society of Mechanical Engineers. All rights 
reserved 
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Fig.4.2 Schematic of how 
model influence and decision b 

: Model risk 
consequence determine overall HIGH 
model risk. Reprinted from 
ASME VV-40:2018, by 
permission of The American 
Society of Mechanical 
Engineers. All rights reserved 


Model risk 
LOW 


Decision Consequence 


computational model. As another example, a model used to define the optimal dosing 
regimen for a phase 3 clinical trial will have a lower risk if it is complemented with 
exploratory results from an in vivo phase 2 clinical trial than if used alone. Both scenarios 
illustrate the impact of model influence on risk, where the lack of supporting evidence to 
answer the question of interest means that the model credibility requirements are greater. 
As an example of the impact of decision consequence, consider a model used to make 
decisions about a medical device whose adverse outcome could result in severe patient 
injury or death. This case will generally be associated with a higher risk than a model 
used to make decisions regarding a medical device whose adverse outcome would not 
significantly affect patient safety or health. 

Model risk assessment may also be completed with a regulatory impact assessment 
for certain applications, which describes what evidence would have been provided in the 
regulatory dossier had it not been for the inclusion of the digital evidence (Musuamba 
et al., 2020). 


4.4 Verification 


Verification aims to quantify the part of the predictive error due to the numerical approx- 
imations/representations. To effectively separate the three sources of predictive error, the 
numerical error should be negligible compared to the sum of epistemic and aleatoric errors 
(see Chap. 2 for details). 

There are three possible sources of numerical error in mechanistic models: proce- 
dural errors, numerical approximation (round-off) errors, and numerical discretisation 
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errors. The first is explored through code verification, while the second and third are 
estimated through the calculation verification (Roy & Oberkampf, 2011) (see also ASME 
VV-10:2019 and ASME V V-20:2009 (R2021)). 


4.4.1 Code Verification 


Code verification aims to identify and, if possible remove, the source code's procedural 
errors and numerical approximation errors in the solution algorithms. Code verification 
testing should be performed for each computing platform, i.e., hardware configuration and 
operating system. 

Code verification relies on software verification tests such as unit tests, integration tests, 
and case tests. These tests can be conducted through self-developed or existing automatic 
software regression suites. In addition, quality control, portability and versioning control 
should also be considered (see Chap. 2). Code verification also involves developing and 
implementing a certified software quality assurance (SQA) program to help ensure the 
integrity of existing code capabilities during development. 

In order of increasing rigour, one can use the following code verification methodolo- 
gies to identify deficiencies: expert judgement, code-to-code comparisons, discretisation 
error estimation, convergence studies, and calculating the observed order of accuracy, to 
ensure an appropriate minimisation of numerical approximation errors.!! Besides expert 
judgement and code-to-code comparison, each approach requires comparing code out- 
puts to analytical solutions (or at least mathematical conditions that ensure asymptotic 
convergence). Traditional engineering problems are one source of analytical solutions, 
e.g., laminar flow in a pipe or bending of a beam. However, because of their simplicity, 
these solutions are often limited in their ability to verify the full breadth of the source 
code. The Method of Manufactured Solutions (MMS) provides a more general source of 
analytical solutions (Roache, 2019), and the Method of Rotated Solutions (MRS) can be 
utilised to expand the scope of traditional engineering problems to provide a broader code 
coverage (Horner, 2021). Documented results from verification studies conducted by the 
software developer may also serve as a source of data to support code verification; how- 
ever, since numerical accuracy is also hardware-dependent, it is a good practice to repeat 
those verification tests on the same hardware that will be used to run the models once in 
use. 

Lastly, it is important to note that the scope of the code verification study must include 
only those portions of the simulation platform (e.g., model form, element type, solver) 
that will be accessed as part of validation and model deployment. 


11 ASME Standard “Standard for Verification and Validation in Computational Solid Mechanics V V 
10 - 2019", 2020. https://www.asme.org/codes-standards/find-codes-standards/v- V- 10-standard-ver 
ification-validation-computational-solid-mechanics. 
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4.4.2 Calculation Verification 


Calculation verification (also sometimes referred to as solution verification) aims to 
estimate the upper bound for the numerical approximation error. 

A first important step in calculation verification is to estimate the magnitude of numer- 
ical errors caused by the discrete formulation of a mathematical model, e.g., due to 
iterative errors and discretization errors. The purpose of calculation verification is to anal- 
yse the numerical solution's spatial and temporal convergence behaviour by refining the 
discretisation parameters and convergence tolerances and to estimate the numerical errors 
associated with using a given model. 

A sensitivity analysis could also be used to ensure that the calculation does not present 
a particular combination of input values around which slight variations in the inputs cause 
significant variations in the outputs (chaoticity). Such occurrence might be due to a soft- 
ware bug or insufficiently robust solver implementations that would have ideally been 
caught during code verification and to ill-conditioning of the numerical problem due to 
an unfortunate combination of input values. 

Finally, calculation verification should ensure that user errors are not corrupting the 
simulation outputs. No matter how accurate the calculations, the predictive model would 
be unreliable if the result is inaccurately transcribed due to a typing error. 


4.5 Validation 
4.5.1 A General Definition 


As outlined in Chap. 2, validation aims to estimate the prediction error and associated 
uncertainty of a computational model. An essential part of the validation exercise is the 
evaluation of the model input(s) and output(s) for the various quantities of interest (QoIs) 
against a comparator, e.g., the experimental data that are used for validation. The com- 
parator should be relevant to the defined CoU and cover a sufficient sample size (“Test 
samples” in ASME VV-40:2018) as well as the desired range of inputs (“Test condi- 
tions” in ASME VV-40:2018). As mentioned in the 2023 FDA guidance (see footnote 
10), acceptable forms of comparators include in vitro, ex vivo, or in vivo test data; these 
tests may be performed ex novo as part of the validation process (e.g., prospective clin- 
ical trial) or based upon historical data (e.g., retrospective clinical trial) or real-world 
evidence. To evaluate the model capacity to predict the Qols, this comparator cannot be 
used during model development or the calibration process. 

Additionally, uncertainty quantification is critical to validating an in silico study. This 
includes estimating the uncertainty associated with the comparator inputs and outputs and 
propagating input uncertainties through the computational model to estimate uncertainty 
in each Qol. 
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4.5.2 Definition and Examples 


Mechanistic models rely on four distinct elements: governing equations (i.e., the mathe- 
matical formulation of the modelled process or phenomenon), system configuration (1.e., 
the device geometry or in vitro system), system properties (i.e., material properties or 
physiological parameters) and system conditions (i.e., initial, boundary and loading con- 
ditions). Model inputs include initial conditions and parameter values related to model 
set-up, such as: 


— part geometry specifications, 

— medical imaging settings, 

— material model frameworks and ranges, 

— boundary conditions, 

— specific patient descriptors such as diet, age, weight, or comorbidities for a drug model. 


The assessment of the model inputs can be divided into the quantification of sensitivities 
and uncertainties. Sensitivity analysis is concerned with how variations in input param- 
eters propagate through the simulation and their relative impact on the output(s). The 
sensitivity study results in a rank-ordered assessment of model input parameters from 
dominant to negligible impact. On the other hand, quantifying uncertainties enables mod- 
elers to propagate known or assumed model input uncertainties to uncertainties in the 
model predictions. The uncertainty analysis provides an error bar (or confidence interval) 
associated with each model output. In some scenarios, collecting model inputs is the lim- 
iting factor in the credibility assessment. However, the same is often true for all evidence 
collected by designed experiments or observation. 

Typically, validation assessment is framed around comparing the model input(s) and 
output(s) to experimental data—i.e., the comparator—obtained in a set-up that is well- 
characterised, well-controlled, and relevant to the CoU. This situation corresponds to 
category 3 of credibility evidence outlined in the previously discussed FDA guidance 
(see footnote 10) describing sources of model credibility evidence (Table 4.1). The defi- 
nition of the comparator should include consideration of both the test samples and the test 
conditions, where each of these can be defined by their quantity, uncertainty, and other 
descriptors. An assessment of the validation activities should also be used to establish the 
similarity of model inputs to those of the comparator and the similarity of the outputs 
and quality of the output comparison. An example of the various elements of a validation 
study is provided in Table 4.2. 

ASME VV-40:2018 provides a framework to demonstrate that a model captures the 
physics of a medical device by comparison to a well-controlled benchtop test. However, 
a model used as part of an in silico clinical trial must be shown to reproduce clinical 
findings. And while the ASME VV-40:2018 standard refers to clinical trials as possible 
comparators, detailed considerations are not provided. However, as outlined in the FDA 
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guidance on model credibility evidence, (see footnote 10) a clinical data comparator may 
be based on in vivo tests performed to support a CoU or population-based data from a 
clinical study (published, retrospective, or prospective) addressing a similar question of 
interest. And while the clinical comparator does not have to be targeted to the same device 
or drug of interest, it should be reasonably similar to ensure appropriate applicability (see 
Sect. 4.5.2). Examples of validation for a clinical comparator are provided in Tables 4.3 
and 4.4. 


4.5.3 Validation Layers for in Silico Methodologies 


To provide rigorous validation of models used to represent preclinical and clinical studies, 
the ICH E11(R1) guideline on clinical investigation of medicinal products in the paediatric 
population suggests starting from "pharmacology, physiology and disease considerations". 
As such, the following three layers are suggested: 


Physiological layer: This model describes the underlying physiology of a human or ani- 
mal system, which could model the treatment at the molecular, cellular, organ, to organ 
Systems scales. Associated QoIs are in qualitative or quantitative agreement with an 
appropriate comparator measured from a healthy system. 


Pathological layer: 'This model describes the disease processes of a human or animal sys- 
tem, which could model the treatment at the molecular, cellular, organ, to organ systems 
scales. Associated QolIs are in qualitative or quantitative agreement with an appropriate 
comparator measured from a pathological system. 


Treatment layer: This model describes the treatment effect on a physiological or patho- 
logical human or animal system (which could model the treatment at the molecular, 
cellular, organ, to organ systems scales). The model could be used to evaluate whether the 
produced Qols are in qualitative or quantitative agreement with an appropriate comparator. 

A strict distinction between these layers is not always possible. For example, the lay- 
ers may be intertwined in the computational model (e.g., physiological and pathological 
layers) or even non-existent (e.g., a physiological layer doesn't apply when simulating an 
in vitro experiment). 

Most guidelines recommend describing the assessment of model form in detail, defined 
in ASME VV-40:2018 as “the conceptual and mathematical formulation of the computa- 
tional model". Where present, each of the three layers needs to be described separately 
and in full detail as appropriate for each modelling activity. But we also recommend pro- 
viding results of a validation activity following the previously described method for each 
layer, following the so-called hierarchical validation approach, even if not required by any 
regulatory guideline or standard. Indeed, there is always the theoretical possibility that the 
errors of one layer hide those of another layer. 
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4.5.4 Uncertainty Quantification 


The quantitative impact of model uncertainty is estimated by observing how input uncer- 
tainties affect the model's outputs. If a particular input is individually measured, the 
uncertainty is related to the reproducibility of the measurement chain in use; otherwise, 
it is the distribution of possible values that the input can assume in the reference pop- 
ulation. As outlined in Sect. 4.1, performing such analysis is essential within risk-based 
frameworks. 

An elegant theoretical framing of the role of uncertainty quantification in decision- 
making can be found in (Farmer, 2017) and reviews of numerical methods for sensitivity 
analysis used in other industrial sectors can be found in (Cartailler et al., 2014; Schaefer 
et al., 2020). Some early examples are available for pharmacokinetic models (Farrar et al., 
1989), cardiac electrophysiology models (Mirams et al., 2016; Pathmanathan & Gray, 
2014; Pathmanathan et al., 2015), models for physiological closed-loop controlled devices 
for critical care medicine (Parvinian et al., 2019), models of intracranial aneurysms (Berg 
et al., 2019; Sarrami-Foroushani et al., 2016) and in systems biology models (Villaverde 
et al., 2022). 

Uncertainty quantification is a stepwise process (Roy & Oberkampf, 2011) that begins 
with identifying all sources of uncertainty, followed by quantifying these uncertainties. 
Uncertainties are then propagated through the model to provide the system response, 
which can be expressed through probabilities under a given confidence interval. A detailed 
technical explanation and an illustrative example can be found in (Roy & Oberkampf, 
2011). The type of uncertainty quantification method (e.g., intrusive, non-intrusive, via 
surrogate models, etc.) should be chosen appropriately according to the model under 
investigation (Nikishova et al., 2019; Smith, 2013). 

The discussion so far has implicitly assumed that all model inputs are affected by a 
quantifiable uncertainty, e.g., due to measurement errors. But in some cases, certain inputs 
do not refer to an individual but rather to a population, and the uncertainty is dominated 
by inter-subject variability. In this case, some authors use the term "prediction interval", 
which encapsulates quantification uncertainties and inter-subject variabilities (Tsakalozou 
et al., 2021). 


4.5.4.1 Clinical Interpretation of Validation Results 

This section is inspired by EMA's distinction between technical and clinical validation, 
which is outlined in the guidance for “Qualification of novel methodologies for medicine 
development" !? and suggested in a letter of support to a request for qualification advice on 
the use of digital mobility outcomes (DMOs) as monitoring biomarkers. This letter stated, 
“The technical validation will verify the device's accuracy and algorithm to measure a 
range of different DMOs. [...] clinical validation will be obtained in an observational 


12 https://www.ema.europa.eu/documents/regulatory-procedural-guideline/qualification-novel-met 
hodologies-drug-development-guidance-applicants en.pdf. 
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multicentre clinical trial" (see also (Viceconti et al., 2020) for more details). While a 
letter of support is not the most authoritative source, we are unaware of any official 
source providing such definitions. 

The distinction that some regulators make between technical and clinical validation of 
new methodologies comes from quantitative biomarkers. Technical validation deals with 
the accuracy with which the quantification is done (i.e., a metrology problem of accuracy 
and precision estimation); while clinical validation deals with the validity of using such 
measurements as evidence in a specific regulatory decision. Traditionally, the accuracy 
with which quantitative biomarkers are measured is high, so technical validation is con- 
sidered necessary but not critical. On the other hand, the relationship between a specific 
biomarker value and the clinical outcome is usually very complex, so clinical validation 
is considered the challenging part of assessing a new methodology. The complexity of the 
relationship between biomarker value and the clinical outcome is also an important reason 
why clinical validation is usually framed in terms of frequentist statistics. The expectation 
is that prior knowledge about such a relationship is scarcely informative. Thus, the valid- 
ity of using the biomarker as a predictor of the clinical outcome is qualified only through 
an extensive induction, where a very large number of clinical experimental validations are 
required. 

A clinical interpretation of the model validation is an assessment of the clinical cred- 
ibility of the predicted quantities, i.e., in situations where the comparator used in the 
validation is population-based data collected as part of a clinical trial. This clinical inter- 
pretation may be required to satisfy regulatory requirements depending on the CoU. 
However, there is very little experience and no published guidelines from the regula- 
tory agencies on this topic. For the time being, we propose to extract and interpret key 
results of the validation process in a manner similar to the one used to demonstrate the 
regulatory validity of a conventional biomarker used as an outcome measure. 

For conventional biomarkers, which are measured experimentally, an outcome mea- 
sure is considered valid for a well-defined CoU when one can demonstrate its construct 
validity, its predictive capacity, and its ability to detect change, where: 


e Construct validity is “the extent to which the measure ‘behaves’ in a way consis- 
tent with theoretical hypotheses and represents how well scores on the instrument 
are indicative of the theoretical construct" (Killewo et al., 2010, page 199). Construct 
validity is typically demonstrated through the evaluation of simulation results (or model 
behaviour) regarding what is known (either quantitative data or qualitative knowledge). 

* Predictive capacity provides evidence that measures can be used to predict outcomes. 
It is thus extracted from comparing the model input and output to experimental data 
obtained in a set-up that is well-characterised, well-controlled, and relevant to the CoU 
described in the previous section. As written above, experimental data used to demon- 
strate predictive capability should differ from the data used to develop and calibrate 
the model. 
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e The ability to detect change is the most critical aspect, as it relates directly to the 
decision-making process at the core of the regulatory process. To demonstrate that 
a prediction can detect change, it is necessary to demonstrate longitudinal validity, 
minimal important difference, and responsiveness: 

— Longitudinal validity is the extent to which changes in the prediction will corre- 
late with changes in the outcome over time or with changes in measures that are 
accepted surrogates for the outcome. Whereas predictive capacity is the correlation 
of the prediction with the outcome at a given time point, longitudinal validity is the 
correlation of changes in the predictions with changes in the outcome over time. 
The relationship between the simulated output and the outcome of interest for the 
regulatory decision should be evaluated in a manner similar to any other biomarker. 
This relationship may be obvious when the model output is a clinical endpoint or 
easily supported if the model predicts a validated biomarker (e.g., a validated surro- 
gate endpoint in the example of tuberculosis vaccine efficacy). However, the model 
would likely need more supportive evidence if the output does not fall into one of 
these two categories. This question is mostly treated in the definition of the CoU, 
where the model's use to answer the question of interest is described and justified. 

— The Minimal Important Difference (MID) is the smallest change in the outcome 
identified as important in the patient's and doctor's opinion. This requires answer- 
ing the following question: is the model precise enough to detect the MID for the 
outcome of interest? The answer to this question is extracted from the uncertainty 
quantification, representing the prediction confidence interval. The prediction con- 
fidence interval cannot include at the same time the MID value and the null value 
(e.g., absence of difference). 

— Responsiveness to the treatment is the most important attribute for establishing the 
clinical validity of a predicted biomarker. It can also be described as the model's 
ability to estimate a clinical benefit. It should be possible to estimate the expected 
clinical benefit from the estimated predicted change of the simulated result. This last 
aspect is closely linked to the validation of the treatment layer, where the modelling 
of the treatment's impact on the system of interest is evaluated. 


Reframing the VVUQ results to this entirely different credibility logic poses several chal- 
lenges. Many in silico methodologies can directly predict the primary clinical outcome 
or at least a Qol that is already accepted as a valid construct for that specific regulatory 
decision. In this case, construct validity is already ensured. Otherwise, this evidence needs 
to be generated using the same approaches used for an experimental Qol; for example, 
by demonstrating convergent and discriminant validity (see for example the systematic 
review by Xin and McIntosh (2017)). Predictive capacity and longitudinal validity are 
two sources of evidence that fit well with the concept of validation according to ASME 
VV-40:2018. The concept of minimal important difference is also implicit in the VVUQ 
framework. If the Qol is already accepted in the regulatory practice as a measured value, 
then there is a good chance that a MID value has already been estimated. Again, an MID 
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study is required if the model predicts a new Qol. The need to demonstrate responsive- 
ness is the one most frequently debated. This typically requires a narrowly defined CoU 
(specific disease, even a specific range of disease progression, specific class of treatments 
to be tested) and one or more fully randomised interventional clinical trials, possibly 
conducted by someone independent from the proponents of the in silico methodology. 

But all this makes sense only if we consider an L2 validation (see Chap. 2), where the 
validation expectation is that the model predicts with sufficient accuracy a central property 
(e.g., the mean) of the distribution of a certain QoI as observed in a well-defined sub- 
population. Because the implicit assumption is that the accuracy of the predictor may vary 
as a function of how we choose the validation sub-population, responsiveness assessment 
requires validation with a sub-population that is as close as possible to the parameter 
space used by the in silico methodology (from which the need for a narrow definition of 
the CoU). But all this would not be valid if the in silico methodology is tested at an L3 
level of validity. In this case, each prediction is subject-specific, and the predicted QoI 
is compared to that measured on the same subject. For such validation, in our opinion, 
the concept of clinical responsiveness collapses into that Applicability according to the 
ASME V V-40:2018. Ideally, in an L3 validation study, the predictive accuracy of the in 
silico methodology is evaluated for the widest possible range of patients, severity of the 
disease, type of treatments, and even across multiple diseases where this is applicable. 
This is a critical point that will need to be addressed. 


4.6 Applicability of the Validation Activities 


Applicability represents the relevance of the validation activities to support using the 
computational model for the CoU. It includes (i) a systematic review of all validation 
evidence supporting the use of the model in the CoU, (ii) a precise comparison of the 
validation context, including both the QoIs and conditions of simulations (e.g., simulated 
population or experimental conditions and the range of conditions studied) and (iii) a 
rationale justifying model use despite the potential differences between the validation 
conditions and the requirements of the CoU. These comparisons are critical since any 
differences or shortcomings can reduce the overall credibility of the model to answer the 
question of interest, even in situations where their validation assessment is sufficient. We 
refer the reader to the framework of Pathmanathan et al. (Pathmanathan et al., 2017), 
which provides step-by-step instructions for determining validation applicability. 

In analogy to what is proposed to evaluate the applicability of the analytical valida- 
tion activities for biomarkers, we recommend describing and assessing the collection/ 
acquisition, preparation/processing, and storage of the comparator data. ° 


PUES. Department of Health and Human Services, Food and Drug Administration, Center for 
Drug Evaluation and Research (CDER), and Center for Biologics Evaluation and Research (CBER), 
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4.7 VVUQ Considerations for Data-Driven Models 
and Agent-Based Models 


The logic behind credibility assessment through VVUQ plus applicability analysis 
assumes implicitly that the model being assessed is mostly knowledge-driven, and that 
the causal knowledge used to build the model has resisted extensive falsification attempts; 
thus, can be considered a scientific truth. Furthermore, the causal knowledge used to build 
the model is expressed in terms of mathematical equations. In such a case, the credibility 
assessment aims to quantify the prediction error and decompose it into its components 
(numerical approximation, aleatoric, and epistemic errors). Then the applicability anal- 
ysis confirms that the prediction error observed in the validation studies represents an 
acceptable prediction error across the range of possible input values. 

The extension of this reasoning to data-driven models poses some problems. Here 
we only give a summary; please refer to Chap. 2 for an in-depth discussion. Validation 
of data-driven models can be performed by calculating the predictive accuracy against 
one or more annotated datasets (e.g., results of experimental studies where the Qol is 
measured together with all input values of the model), as long as these datasets were not 
already used to train the model (test sets). In knowledge-driven models, the epistemic 
errors are limited to how we implement reliable knowledge in our model; in data-driven 
models, epistemic errors are not bounded a priori. Numerical approximation errors do 
not exist when there are no equations to solve; hence some verification aspects may not 
apply (whereas others, such as software quality assurance, remain). Applicability analysis 
assumes a certain degree of smoothness in how the prediction error varies over the range 
of possible input values. While for knowledge-driven models, this assumption descends 
from the properties of the equations that represent the knowledge, such an assumption 
is not guaranteed in data-driven models. In principle, an artificial neural network model 
could have a predictive error of 10% of the measured value for a given set of input values, 
and an error of 100% for another set of inputs, even if those are quite close to the first 
set. But the bigger difference is related to the risk of concept drift that all data-driven 
models face. Data-driven models make predictions by analysing the correlations between 
inputs and outputs over a set of experimental measurements. Concept drift means the 
predictive accuracy of a data-driven model decreases over time. This may happen for 
several reasons, for example, if the data sample used to train the data-driven model is no 
longer fully representative of the phenomenon being modelled. While there are techniques 
to reduce this problem, there is never absolute certainty that concept drift will not occur. 
This is why there is a growing consensus that the credibility of data-driven models must 


“Biomarker Qualification: Evidentiary Framework Guidance for Industry and FDA Staff DRAFT 
GUIDANCE,” 2018. https://www.fda.gov/media/1 1927 1/download. 
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be framed in a Predetermined Change Control Plan, where the predictive accuracy is 
re-assessed on newly collected data.!^ 

Moving to agent-based models (ABM), these are a class of predictive models used in 
biomedicine. Agent-based models are a generalisation of the concept of cellular automata 
that was first proposed in the 1940's. Most ABMS are formulated in terms of rules, through 
which, at each time step the state transitions of the autonomous agents in the simulation 
are decided. The key point here is how such rules are defined. If the rules are defined 
empirically, for example by analysing experimental data, the credibility of that agent- 
based model should be assessed in a manner similar to data-driven models, with all the 
implications mentioned above. On the other hand, if the rules descend from quantitative 
knowledge that has resisted extensive falsification attempts, then the agent-based model 
should be considered a knowledge-driven model. However, in this second case, some 
differences apply due to the fact that the knowledge that drives the model is not expressed 
in terms of mathematical equations, but in terms of rules. This makes the concept of 
verification more complex to define (see for example (Curreli et al., 2021)). 

As this field evolves, more and more sophisticated models will appear. For exam- 
ple, some problems may require a combination of data-driven and knowledge-driven 
modelling to make accurate predictions. The definition of the correct credibility assess- 
ment process for such hybrid models is challenging and cannot be generalised at this 
time. As a rule of thumb, each model should be classified as predominantly data-driven 
or as predominantly knowledge-driven, and the credibility assessment process should stem 
from such classification. 


4.8 Final Credibility 


Once the credibility assessment is completed, it must be determined if the model is 
sufficiently credible for the CoU. Note that the CoU can be modified, and the cred- 
ibility assessment repeated if the model fails the credibility assessment. Alternatively, 
the model itself, or the credibility activities, can be revisited and improved to reach 
the required level of model credibility. A comprehensive summary of the computational 
model, model results and conclusions must be documented and archived upon conclusion 
of the modelling project. 


14 EDA white paper "Artificial Intelligence/Machine (AI/ML)-Based Software as a Medical Device 
(SaMD) Action Plan", Jan 2021. https://www.fda.gov/media/145022/download. 
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4.9 Essential Good Simulation Practice Recommendations 


— The credibility of in silico methodologies based on predominantly mechanistic models 
can be effectively demonstrated following the risk-based approach to model verifica- 
tion, validation and uncertainty quantification as detailed in the ASME VV-40:2018 
technical standard. The credibility of methodologies based on predominantly data- 
driven models should follow a Predetermined Change Control Plan, where the model's 
credibility is periodically retested using new test data. 

— Where applicable, the validation of predominantly mechanistic models should be per- 
formed separately for the physiological modelling layer, the disease modelling layer, 
and the treatment modelling layers. 

— Regulators qualifying in silico methodologies to be used as drug-development tools 
expect that prior knowledge is generally scarcely informative. 

— Regulators currently require that in silico methodologies used as drug-development 
tools are qualified following the same regulatory framework used for experimental 
methodologies. In particular, the technical validation is expected to be separated from 
the clinical validation. 

— In analogy to what is proposed to evaluate the applicability of the analytical valida- 
tion activities for biomarkers, we recommend describing and assessing the collection/ 
acquisition, preparation/processing, and storage of the comparator data used to validate 
in silico methodologies. 
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5.1 Introduction 


Regulatory science is ultimately a matter of trust. You need to trust that certain evidence, 
when obtained with certain methodologies, is sufficient to inform about a new medical 
product’s safety and/or efficacy. Trust is formed based on previous experience but is also 
informed by the educational background of the experts involved and, in particular, how 
they decide when a belief can be considered true. And when previous experience is scarce, 
the educational background drives the decision to trust a new methodology. 

Medical device regulators build their regulatory science using an epistemology that 
is at least partially that of physical sciences. In this context, it is common to expect 
quantitative experimental results, measurement methodologies mostly free of systematic 
errors (unbiased), and prior knowledge from fundamental laws of physics and chemistry 
to be frequently informative. Under these expectations, the inference is mostly Bayesian 
in that posterior probability is the product of the likelihood probability observed through 
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controlled quantitative experiments and the prior probability that existing knowledge pro- 
vides. There is also an expectation that the prior probability and the likelihood are quite 
similar because the prior knowledge in use has frequently resisted extensive falsifica- 
tion attempts, which is the theoretical basis of the concept of validation. This opens the 
door to using in silico methodologies to reduce, refine, and partially replace experimental 
methodologies. 

Drug regulators build their regulatory science using an epistemology that derives from 
natural and social sciences. In this context, it is common to expect experimental results 
that are qualitative or semi-quantitative. Even when quantitative results are available, there 
is an expectation that they may be affected by considerable systematic errors caused by 
selection, information, and confounding biases. There is also the expectation that prior 
knowledge is scarcely informative due to the complexity and the non-linearity of the 
phenomena under investigation. Inference is mostly frequentist, under these expectations. 
Prior knowledge (and thus in silico methodologies based on it) can, at most, be used 
to inform the design of experimental studies, and replace them only when experimental 
studies are impossible. 

As medical products (and the technology to test them) evolve, these expectations need 
to change. But such change will not happen overnight. Trust in the in silico methodologies 
will grow as they demonstrate their validity when used as clinical technologies and as clin- 
ical research tools in pre-regulatory settings. But in parallel, it is also necessary to break 
down the cultural walls that separate the regulatory science for medical devices from that 
for drugs. Scientific advisory panels should become more interdisciplinary and represent 
all expertise. Targeted re-training programs are also necessary for the staff of regulatory 
agencies that inform them of the opportunities and risks those innovative technologies 
pose. 

Another possibility discussed in this chapter is to modify the current regulatory qual- 
ification pathways for in silico methodologies. This might allow for the optimal use of 
expertise within regulatory agencies to provide a thorough and balanced qualification 
process. In the following sections, we discuss possible alternative pathways to provide 
elements for reflection to regulatory agencies. 


5.2 Pre-certification as Predictive SaMD 


Most regulatory authorities nowadays recognise software with a medical purpose as a 
special class of medical devices called Software As a Medical Device (SaMD). The Inter- 
national Medical Device Regulators Forum (IMDRF) defines SaMD as "software intended 
to be used for one or more medical purposes that perform these purposes without being 
part of a hardware medical device.” FDA CDRH and the EU CE-marking process both 
include established regulatory pathways for these technologies. A special case is that of 
SaMDs with predictive capabilities. Examples of this new class of SaMD are solutions 
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for fractional flow reserve (Zarins et al., 2013), planning software for transcatheter aortic 
valve replacement (Halim et al., 2021), or software to predict the risk of hip fracture from 
CT data (Keaveny et al., 2020). A recent FDA draft guideline confirms that even for these 
solutions, the ASME V V-40: 2018 can be used to assess the credibility of these predictive 
models. 

A first possible regulatory pathway for in silico methodologies could be to require that 
any evidence supporting the marketing authorisation of a new medical product (whether 
the medical product is a medical device, a drug, or an ATMP) if obtained in silico, should 
be produced with technologies certified as predictive SaMD. Once the in silico method- 
ology is certified as a predictive SaMD, its qualification as a medical device or drug 
development tool would be limited to the clinical validation aspects. 

The main limitation of this approach is that not all in silico methodologies are patient- 
specific models, and thus their framing into a medical purpose might be impossible. 
Another potential issue is that some safety requirements that are indispensable for medical 
purposes might not be necessary when the model is used as an in silico methodologies 
solution; thus, this pathway might be unnecessarily severe for some solutions. On the 
other hand, it would simplify the regulatory process for solutions intended as SaMD and 
in silico methodologies, as the SaMD certification would cover the technical validation in 
the qualification process. 


5.3 Certification of the Technical Validity 


A more limited version of the SaMD pathway could be a certification of the technical 
validity according to the ASME V V-40:2018 or other similar standards or technical guid- 
ance provided by FDA CDRH or EU notified bodies. Once an in silico methodology has 
such certification, the qualification as a medical device or drug development tool would 
focus only on clinical validation. 

The main limitation of this approach is the need to establish an accreditation process 
for bodies with the relevant expertise that can produce a credibility certification according 
to some internationally accepted technical standard. 


5.4 Towards an Ad Hoc Qualification Pathway for In Silico 
Methodologies 


A third possible strategy could be to recognise that the qualification of in silico method- 
ologies requires a specialised panel, regardless of whether they are used to develop drugs 
or medical devices. This would imply creating an ad hoc process that cuts through most 
regulators’ current organisations built on the distinction between drugs and devices. The 
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scientific advisory panel would include the same expertise normally found in qualification 
panels but also experts of in silico methodologies, who are qualified to evaluate the most 
technical aspects. 

The main limitation of this approach is that an ad hoc qualification pathway would 
need to be created. In the US, such a scenario could be realised through a collaboration 
between CDER, CBER, and CDRH, where the management of such an ad hoc qualifi- 
cation pathway could be delegated to one of the three FDA centres or operated under a 
collaborative model. This approach would be more complicated in Europe, given that no 
central authority for medical devices exists. 


5.5 Adapting the Existing Qualification Pathways to In Silico 
Methodologies 


The least disruptive approach would be to embed the technical review process into the 
existing qualification pathways. The FDA provides qualification pathways for medical 
device and drug development tools, whereas the EMA provides one only for drug devel- 
opment tools. Qualifying a new methodology is not mandatory but highly recommended, 
especially for innovative methodologies. Seeking qualification for a method provides an 
early engagement with the regulatory agencies and will facilitate the integration of this 
tool into various product development processes. 

Currently, a new methodology is qualified for regulatory use by first requesting qual- 
ification advice on the process intended to be used to demonstrate the validity of the 
new methodology in that CoU. If the authority agrees with the approach, the next step 
is to conduct the planned validation studies and request a formal qualification opinion. 
A positive draft qualification opinion is then made public to debate the adequacy of the 
validation evidence. The new methodology is confirmed in its final form if no criticisms 
emerge from the expert review. A developer can use that methodology to produce evidence 
in a marketing authorisation application for a new product without providing additional 
information on the methodology. 

Existing qualification pathways are separated by the type of medical product: so 
there are pathways for drug development tools (e.g., small molecules, biologics, ATMPs, 
microbiome-derived products), and for medical device tools. They currently focus on clin- 
ical validation of the methodology rather than on its technical validation. For example, 
in a recent qualification opinion of EMA on a digital health methodology,! the only ref- 
erence to the technical validity of the new methodology is in a footnote, and the only 
quantitative requirement is: “The length and velocity of the strides should be accurately 
measured with an error at 1 sigma (6846 confidence interval) under 2.596." 


1 https://www.ema.europa.eu/documents/scientific-guideline/qualification-opinion-stride-velocity- 
95th-centile-secondary-endpoint-duchenne-muscular-dystrophy_en.pdf. 
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One major limitation of this approach is that qualification pathways are not available 
for all types of products worldwide. Qualification procedures exist both in the EU and in 
the US. However, it should be noted that while the FDA provides a qualification proce- 
dure for new methodologies used to develop new drugs? and new medical devices,? such a 
pathway is available in the EU only for methodologies used to develop new drugs. There 
are no qualification pathways for medical device development tools in Europe, a major 
hurdle when advanced, complex, innovative methodologies such as in silico methodolo- 
gies are being proposed. Another issue is the focus of existing scientific advisory panels 
on the clinical validation aspects. In silico methodologies require a thorough credibility 
assessment, requiring experts to evaluate the dossier properly. Therefore, using existing 
qualification pathways also for in silico methodologies would require the extension of 
these panels to include experts in computational methodologies. 

Another issue is that in silico methodologies are sometimes developed to address a 
specific use case relevant only to that product. In this scenario, it would be more conve- 
nient to include the evidence of credibility for the in silico methodology in the marketing 
authorisation dossier rather than undertaking a separate qualification procedure. 


5.6 Essential Good Simulation Practice Recommendations 


— Regulatory agencies should increase the interdisciplinarity of scientific advisory panels 
and develop targeted staff re-training programs on the opportunities and risk those 
innovative technologies pose. 

— Regulatory agencies should explore whether existing qualification pathways should be 
adapted to include in silico methodologies properly or if creating new qualification 
pathways for these methodologies is more prudent. 


2 https://www.fda.gov/drugs/development-approval-process-drugs/drug-development-tool-ddt-qua 
lification-programs. 

3 https://www.fda.gov/medical-devices/science-and-research-medical-devices/medical-device-dev 
elopment-tools-mddt. 

* https://www.ema.europa.eu/en/human-regulatory/research-development/scientific-advice-pro 
tocol-assistance/qualification-novel-methodologies-medicine-development-0. 
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6.1 Introduction 


Two intersections exist between in silico methodologies and Health Technology Assess- 
ment (HTA). The most obvious is when in silico methods are used as predictive Software 
as Medical Devices (SaMD), e.g., as clinical decision support systems. In such cases, HTA 
is used, like any other medical technology, to evaluate if its adoption is cost-effective and 
clinically appropriate (criteria under which a certain intervention is properly prescribed 
to a patient; according to the Italian Medicine Agency (AIFA), appropriateness is defined 
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as "adequacy of the actions adopted to manage a disease, concerning both the patient's 
needs and the correct use of resources" (Garattini & Padula, 2017)). 

A second intersection is when the use of In Silico Trials in the regulatory qualifica- 
tion of a medical product impacts the HTA assessment of that new product. Using In 
Silico Trials to replace, reduce or refine human experimentation could improve the ability 
to detect change (which would turn into a more sensitive assessment of differences in 
efficacy/performance). It could also provide an efficacy/performance assessment closer to 
real-world effectiveness, as virtual patients may make it easier to explore the efficacy of 
under-represented sub-groups in clinical trials. In addition, in silico methodologies could 
reduce or replace Phase 4 trials, produce early estimates of quantities of interest for the 
HTA assessment of medicinal products, and support the so-called early discourse on HTA. 
This second perspective is the focus of this chapter. 


6.2 Assessing in Silico Methodologies for HTA 


Modelling and simulation is a constantly expanding field, in terms of both method- 
ology and application sectors. As the models evolve in complexity and increased 
uptake, it becomes essential to clarify the most appropriate tools for evaluating in silico 
methodologies, especially those that can contribute to HTA. 

Developers often make very strong claims with poor reporting and/or weak VVUQ 
evidence supporting those models. These tools still have a long way to go in terms of 
implementation and public adoption, as well as rigour in their use, which can be incon- 
sistent and unbalanced at the moment (Musuamba et al., 2021). This chapter aims to 
provide input on the scientific evaluation of in silico methodologies of health interven- 
tions (drugs and other technologies) from the HTA point of view and the role that such 
technologies can play in HTA. 
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6.3 Introduction to Health Technology Assessment (HTA) 


HTA is a multidisciplinary process that uses explicit methods to determine the value of 
health technology at different points in its lifecycle. The purpose is to inform decision- 
making to promote an equitable, efficient, high-quality health system.! In many countries, 
it is now common to perform this systematic and multidimensional evaluation of health 
technologies aimed at informing coverage, reimbursement, or pricing decisions within 
public healthcare systems. 

The process is formal, systematic, and transparent, using state-of-the-art methods to 
consider the best available evidence. The dimensions of value for a health technology may 
be assessed by examining the intended and unintended consequences of using a health 
technology compared to existing alternatives. These dimensions often include clinical 
effectiveness, safety, costs and economic implications, ethical, social, cultural, and legal 
issues, organisational and environmental aspects, and wider implications for the patient, 
relatives, caregivers, and the population. The overall value may vary depending on the 
perspective taken, stakeholders, and decision context. 

HTA can be applied at different points in the lifecycle of health technology, i.e., pre- 
market, during market approval, post-market, and through the disinvestment of health 
technology. The approach and methods used at each phase will differ and depend on 
the available evidence (whether primary or secondary data) and the decision about the 
technology. 

Whilst licensing approval is mainly focused on the technical and safety profile of the 
medical device, HTA bodies have different interests and, therefore, different evidence 
requirements. Normally, the requirements aim to inform policymakers (and decision- 
makers in general) of the rationale allocation of resources within finite budgets to the 
funding (or using) of healthcare interventions. For this reason, data required for market 
access might go beyond those used or developed for licensing, particularly in medical 
devices, where regulatory requirements have historically been low. 

This additional evidence generation could also be worthwhile from the manufacturer's 
perspective. With prepaid financing mechanisms for health systems through general tax- 
ation or private/social insurance, third-party payers’ coverage strongly influences market 
prospects for medical technology companies. For example, granting a CE mark does 
not imply that the product will be available to patients everywhere in the EU. If the 
HTA assessment leads to declined public reimbursement in a particular country, the vast 
majority of patients cannot afford the product in that country. 

It is important to mention that “health technology” is a broad concept. The accepted 
international definition of a health technology is an intervention developed to prevent, 
diagnose, or treat medical conditions; promote health; provide rehabilitation; or organise 


! HTA Glossary. International Network of Agencies for Health Technology Assessment (INAHTA), 
Health Technology Assessment international (HTA1) and other partner organizations. Available at: 
http://htaglossary.net/HomePage. 
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healthcare delivery. The intervention can be a diagnostic test, device, medicine, vaccine, 
procedure, program, or system.? As we explained in Chap. 1, in silico technologies can 
be used as medical devices, as they are used in the diagnostic, prognostic, or therapeutic 
process. In addition, they can be used to evaluate the safety, efficacy/performance, pre- 
scriptive appropriateness, and cost-effectiveness (HTA) of a new medical product, whether 
a medical device or a drug. This chapter will mainly focus on this second use, touching 
on the first in the final section of future challenges. 

It is also worth mentioning that the modelling and simulation methods are frequently 
used to evaluate different types of (implemented) medical interventions, often in the con- 
text of HTA. These studies have mainly been used to supplement systematic reviews in an 
effort to increase the usefulness of the evidence summary. Uncertainty about the optimal 
choice among available interventions for important patient-relevant outcomes may persist 
even after synthesising the best available evidence. Indeed, decision-makers are increas- 
ingly interested in complementing the results of systematic reviews of empirical evidence 
with information from modelling and simulation studies. That is, integrating empirical 
evidence on benefits and harms, values (preferences), and/or resource utilisation while 
accounting for all relevant sources of uncertainty (Dahabreh et al., 2008, 2017). Some of 
the most frequent applications of this type of modelling and simulation are used: 


— to synthesise data from disparate sources (modelling provides mathematical tools for 
evidence synthesis and the assessment of consistency among data sources), 

— to make predictions ("interpolations", forecasts, “extrapolations”, prioritisation and 
planning), 

— to support causal explanations and infer the impact of interventions, or 

— to inform decision-making (about patient-level care, drug or device licensing, health 
care policy or the need to conduct additional research (Dahabreh et al., 2017). 


Although this specific scenario of modelling and simulation based on the combination of 
already existing evidence/data could be considered an in silico methodology, it will not be 
included in this chapter as there are good and updated reviews on that (Dahabreh et al., 
2008, 2017; Jalali et al., 2021). 


? HTA Glossary. International Network of Agencies for Health Technology Assessment (INAHTA), 
Health Technology Assessment international (HTAi) and other partner organizations. Available at: 
http://htaglossary.net/HomePage. 
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6.4 In Silico Methodologies as a Source of Evidence 


Science generates evidence through observation, deduction, and induction. Simulation, 
like deduction, starts with specified assumptions regarding a proposed system and gener- 
ates data suitable for analysis by induction. However, this data does not come from direct 
observation in the real-world (Stahl, 2008). 

These assumptions can be designed according to observed data and predicted as a 
function of the experimentally observed variability (phenomenological) or by leveraging 
some pre-existing knowledge about the physics, chemistry, physiology and biology of the 
phenomenon being modelled (Viceconti et al., 2020b). 

In silico methodologies can be a source of evidence when developing or validating a 
health technology, a pharmaceutical product or a medical device (model-based medical 
results or computational modelling and simulation results). These are predictive com- 
puter models that are used to provide evidence in support of the safety and/or efficacy/ 
performance of a medical product during its marketing authorisation process. It can also 
be used during any assessment phase through the technology lifecycle and, thus, become 
part of the evidence to be used for HTA as well. 

Methodologies and tools used to produce regulatory evidence are usually qualified by 
the regulator or certified according to a specific technical standard such as, for example, 
the ASME VV-40 for the use of computational modelling to evaluate medical devices.? 


6.4.1 Medical Devices and Interventions 


Computational modelling and simulation can help to increase the scientific evidence for 
evaluating high-risk medical products and interventions, especially when they enable 
replacing, reducing and refining nonclinical in vitro/ex vivo experiments, nonclinical 
animal studies or clinical human studies in case of ethical issues and, time or costs 
constraints. 

It is also particularly significant with the new medical device regulation* of the Euro- 
pean Commission where scientific evidence used to assess high-risk medical devices must 
be based on methodologically sound trials, which may be supplemented with alternative 
evidence sources such as computational modelling and simulation (Olberg et al., 2017). 


https://www.asme.org/codes-standards/find-codes-standards/v- V-40-assessing-credibility-comput 
ational-modeling-verification-validation-application-medical-devices. 


https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32017R0745. 
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6.4.2 Pharmaceutical Products 


The clearest indication for using simulation methods is when direct experimentation via 
randomised controlled trials (RCT) is impossible due to cost, time, or ethical constraints. 
In this regard, RCTs can be considered a form of simulation as it represents and sim- 
plifies the system under study. However, computer simulations of these trials typically 
decrease time and cost, besides overcoming some ethical restrictions of experimentation 
on humans. These ethical limitations can mainly be found when a question needs explor- 
ing (effects of exposure), but conducting the trial would require exposing a vulnerable 
group to unacceptable risks (Stahl, 2008). 

Computational methods aim to complement in vitro and in vivo tests to minimise the 
need for animal/human testing, reduce the cost and time of toxicity tests, and improve 
toxicity prediction and safety assessment. In silico toxicology encompasses simulation 
tools for biochemical dynamics and modelling tools for toxicity prediction. They are 
useful in drug design to determine how drugs should be altered to reduce their toxicity. 
In turn, this knowledge can be used for the evaluation of pharmaceuticals and to enrich 
clinical evidence. 

For example, there are methods for predicting outcomes based on chemical analogues 
with known toxicity. On the other hand, researchers also use dose-response or time— 
response models, which establish relationships between doses or time and the incidence 
of a defined biological effect (e.g., toxicity or mortality) (Viceconti et al., 2017). 


6.5 In Silico Methodologies: Product Life Cycle and HTA 


At the cost of oversimplifying, any health technology's development and assessment 
cycle can be reduced into different macro-phases: design/discovery, pre-clinical and clin- 
ical assessment, regulatory assessment, market access and post-marketing assessment. 
Decision-maker uncertainty is high in the discovery and design phase when new and 
emerging health technologies have not yet generated any evidence regarding the future 
value they could bring to the health systems. The more we move through the diffusion 
curve of technologies, the more evidence is generated and uncertainty reduced. In sil- 
ico methodologies have the potential to support all steps of the product life cycle (see 
Table 6.1). 
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6.6 Methodologies for in Silico Clinical Studies 


6.6.1 HTA Health Technology Assessment 


Throughout the product life cycle, the industry increasingly relies on computational mod- 
elling and simulation to speed development and provide additional performance and safety 
assurance. Still, such use is currently limited (Viceconti et al., 2016). According to the 
results of a 2014 survey of medical device companies, computational modelling and sim- 
ulation were more commonly utilised in the early stages of product development or after 
product commercialisation, but rarely to simulate the interaction of the device with a 
laboratory animal or a patient." 

When in silico methodologies are used as a source of evidence for health technology 
development, extending the traditional HTA that informs coverage/reimbursement deci- 
sions to early HTA that informs early research, development, and investment decisions 
(Tummers et al., 2020) could be of great importance, especially for medical devices, 
where the development process is a costly and uncertain undertaking (Ijzerman & Steuten, 
2011). Failed development not only results in a lack of economic return for the company, 
but also in higher costs without healthcare improvements for society. There are multiple 
reasons for failed device development, but one important factor is the lack of an early 
evaluation of the device potential in healthcare practice, usually only after the prototype 
design is finalised. The aim of an early assessment is to reduce the failure rate at each 
stage of the development process, while enhancing R&D efficiency and of limiting the use 
of resources, through prioritisation of the innovations most likely to succeed among oth- 
ers. It may also be used to support reimbursement claims by providing quantitative input 
for developing risk—benefit sharing agreements (Markiewicz et al., 2014). With improved 
confidence in modelling results and a better-established regulatory framework, the use of 
in silico evidence as part of the regulatory submission process is becoming more com- 
mon, but it has not yet entered the HTA arena and evidence from in silico methodologies 
is seldom used in HTA. 


6.6.2 Discovery, Design and Pre-clinical Stages 


Using in silico methodologies in the discovery and design stage can potentially streamline 
target identification, secure proof of concept, and identify those drugs/devices worthy of 
progressing into pre- and clinical development. In silico methodologies can also stream- 
line the finding of which new and emerging health technologies have the potential to 
satisfy identified health system unmet needs. 


5 https://www.avicenna-alliance.com/upload/avicenna-alliance-position-paper-in-silico-evidence- 
application-to-medical-devices-28-may-2021_64bfd88f420a0.pdf. 
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Compared with in vitro, ex vivo, and in vivo experiments, in silico simulations are 
fast, cheap, safe, easy to implement and free of experimental errors. Consequently, they 
are becoming increasingly helpful in designing new technologies and strategies. 

Simulations and computational models allow the effect of the interactions to be exam- 
ined not only at the local level but in the context of the entire pathway in which the target 
interacts. To include all the features of these complex systems in these pathways, simu- 
lation at the biochemical level may be a suitable foundation for simulation. In this sense, 
different computational models have been proposed to simulate intercellular interaction at 
the biochemical and physical levels. By means of this type of model, information on the 
impact of the target on metabolism can be obtained. 

Preclinical in silico assays can potentially minimise problems in the translation 
between experimental and clinical research. Moreover, preclinical data can be a valid 
source to include in a computational model to gain additional insight into the factors that 
modulate the response in later clinical phases. For this reason, in silico experiments have 
the capacity to reveal and formalise the underlying mechanisms. 

The potential use of in silico methodologies can be particularly important in the chemo- 
prevention and toxicology (Benigni et al., 2020; Valerio, 2009). In silico methodologies 
are used effectively in preclinical studies to optimise dosage administration and predict 
the overall performance of the optimised schedule (Pappalardo et al., 2019). Because 
the number of chemicals marketed for human use is rapidly increasing, computational 
toxicology models have been developed that estimate the event probability of a molecule 
based on its chemical structure (Quantitative Structure-Activity Relationship or QSAR). 

Using in silico experiments to predict toxicological outcomes of drugs and hazard and 
risk assessment is widespread. Such experiments can determine the priority of molecules 
for in vivo or in vitro testing. This prioritisation optimises the testing strategy, potentially 
minimising the need for animal testing (Benigni et al., 2020). 

In this regard, Passini et al. have recently developed software which runs in silico 
drug trials in populations of human cardiac models, simulating populations of human 
action potentials. Designed to predict drug safety and efficacy, the software simulates the 
effects of drugs on the action potentials of cardiac cells. After conducting variable drug- 
dose response studies, this software provides statistics of biomarkers of drug action and 
adverse drug effects, such as arrhythmias, with good clinical accuracy. For example, an 
in silico trial of 62 drugs showed that in silico simulations predicted clinical risk with 
89% accuracy (Passini et al., 2017, 2021). In 2011, the US Food and Drug Administra- 
tion (FDA) approved the first in silico diabetes type 1 model as a possible substitute for 
pre-clinical animal testing for new control strategies for type 1 diabetes. The European 
Medicines Agency is also considering in silico approaches as an alternative to animal 
testing to protect animal health and the environment. 
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6.6.3 Clinical Development 


6.6.3.1 Medical Devices 

Computational models of the heart based on data obtained from medical imaging of 
patients have made it possible to use simulations to view different strategies for cardiac 
rhythm configuration. They have also enabled the identification of the optimal region 
for localising cardiac pacing. These models are not yet widely accepted as medical 
devices for clinical decision-making. However, this example illustrates how such models 
and simulations can be applied in personalised medicine. 

An example of these developments in patient-level simulation is blood flow simulation 
using MRI images combined with blood pressure and blood flow information. With the 
CRIMSON software (Arthurs et al., 2021), 3D models of the arterial system are created 
and used to determine a prognosis and then to perform an intervention that best preserves 
blood flow (Ahmed et al., 2021). 

The oncNGS pre-commercial procurement’ procedure aims to develop novel, afford- 
able solutions to provide the best Next Generation Sequencing (NGS) tests for all solid 
tumours/lymphoma patients. The call for tender? launched in December 2021, is chal- 
lenging the market to address their identified unmet needs through the provision of an 
efficient molecular DNA/RNA profiling of tumour-derived material in liquid biopsies 
using a pan-cancer tumour marker analysis kit. This analysis includes NGS analysis 
integrated with an in silico decision support system that also provides analytical test 
interpretation and reporting. The oncNGC PCP contract is structured in three phases: 


— Phase 1: Design of the oncNGS solution 

— Phase 2: Technical, analytical and clinical performance validation of the oncNGS 
complete solution prototype at the Supplier’s site 

— Phase 3: Technical, analytical and clinical performance validation of the oncNGS 
solution in the clinical samples in Supplier’s sites and real clinical settings. 


To ensure suppliers keep working on the sustainable dimension of the novel solutions 
across the three phases, they are required to keep up-to-date in silico simulations of their 
novel panels during both Phase 2 and Phase 3 to demonstrate their solutions are affordable 
ensuring sufficient and homogeneous coverage of all the targets in agreement with the 
business case to be applied in routine basis, at each (chemo)therapy cycle to follow clinical 
response and inspire adaptive therapies. 


6 Non-invasive simulated electrical and measured mechanical indices predict response to cardiac 
resynchronization therapy - Research Portal, King’s College, London (kcl.ac.uk). 

7 Http://oncngs.eu/. 

8 https://ted.europa.eu/udl?uri=TED: NOTICE:624705-2021:TEXT:EN:HTMLé&tabld=1. 
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6.6.3.2 Pharmaceuticals 

Phase III clinical trials evaluate a new drug in terms of its clinical value (efficacy and 
safety), its most appropriate dose and dosage (posology), as well as other aspects such as 
adherence and tolerability. These in vivo studies are expensive and challenging to conduct, 
as a large sample size is required. 

By providing a reliable prediction of the Phase III outcomes based on the data col- 
lected during the Phase II clinical trial, in silico methodologies may increase confidence 
in investing at this late stage of the pre-commercial process. High-quality in silico method- 
ologies using subject-specific models could be proposed as valid evidence to complement 
the information from these trials, possibly with the requirement to carry out studies to 
confirm the simulated post-marketing outcomes with real-world data. 

By using in silico methodologies to predict outcomes for potential phase III trials, it 
is possible to optimise both the experimental design and the required sample size. As a 
result, the development cost could be reduced, as well as the time to market (Pappalardo 
et al., 2019). 


6.6.4 Market Access and Post-marketing Assessment 


FFRcr software, developed by the US medical firm Heartflow to provide a non-invasive 
quantification of the fractional flow reserve in coronary stenosis, was the first clinical 
technology based on subject-specific modelling to get marketing authorisation from FDA. 
The software has also received CE marking and regulatory approval in Japan. 

What might be relevant from the perspective of decision-makers is the possibility of 
testing and identifying in advance which patients’ subgroups are likely to benefit the most 
from a novel technology (i.e., enhanced patient population stratification) or to investigate 
and provide empirical evidence of safety issues that could emerge as a result of the 
implementation of the technology with consequent streamlined recommendations for a 
safer and effective indication of use (Ciani et al., 2017). 

Another relevant example is the stratification of patients with infectious diseases due 
to multi-drug resistant (MDR) organisms. Thanks to the provided research and devel- 
opment services contracted through Anti-SUPERBUGS pre-commercial procurement,? 
ANTI-SUPERBUGS PCP Buyers’ Group aims to: 


— Reduce both the costs and the operational impact resulting from infections caused by 
multi-drug resistant organisms; 

— Improve the appropriateness of antimicrobial medicine usage; 

— Improve the quality-of-care processes in hospitals; 


9 https://antisuperbugs.eu/. 
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— Reduce the community and social care impact of MDR infections acquired in hospitals 
by procuring pre-commercial technologies that will transform current Surveillance and 
Infections control systems into new comprehensive systems. 


The 2019 call for tender is challenging the market to address their identified unmet 
needs through the provision of an anti-superbug in silico solution comprising a bundle 
of technologies offering different approaches and outputs at a different levels of infec- 
tion management (as surveillance, environmental safety, first patient screening and patient 
early diagnosis). 

Subsequent public procurement of innovative solutions (PPI), already under prepa- 
ration, will need to consider that the current COVID-19 pandemic is exacerbating 
antimicrobial resistance. Data from some EU countries suggest that 6.9% of COVID- 
19 diagnoses are associated with bacterial infections (3.5% diagnosed concurrently and 
14.396 post-COVID-19), with higher prevalence in patients who require intensive critical 
care (Strathdee et al., 2020). In silico methodologies offer the advantage of increasing 
the cohorts, refining clinical validation, and taking into consideration this new potential 
use case, including intensive care unit (ICU) patients infected by COVID-19 and MDR 
organisms to be able to demonstrate value to the buyers by predicting the real-life benefits 
and the optimal target use case. 


6.6.5  Post-marketing Assessment 


In silico studies should also be part of adaptive licensing and reimbursement path- 
ways, where access and coverage are gradually extended as the evidence-based evolves 
and benefits are demonstrated in clinical research for broader patient populations. This 
overlapping interest from regulatory bodies, industry, clinics, academia, and even animal- 
welfare groups has led to the establishment of networks and initiatives worldwide to 
promote developing, validating, and using in silico medicine technologies. 


6.7 Critical Assessment of the in Silico Approach and Limitations 


Attention should also be drawn to current in silico simulation tools' limitations to provide 
a balanced perspective regarding their potential role in future HTA. 

One of the primary limitations is that these techniques should be considered when 
considering their use in HTA. Primarily, it should be noted that these techniques do not 
currently allow adequate predictions for all chemicals and outcome variables. Of particular 
relevance is that there are currently no models for certain systems or components. 
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Model adequacy is particularly interesting when evaluating complex systems, such as 
drugs with multiple mechanisms of action or the interaction of different drugs in poly- 
medicated patients. 

These limitations are partially a result of the reliability or transparency of the data used 
to design the model on which the simulations will be performed. For example, incorrect 
training data describing the relationship between dosage and adverse events would amplify 
these errors in the prediction model. 

Using in silico techniques may add greater uncertainty if it replaces in vivo experi- 
mentation. This is because assessing the simulation results' external validity is desirable. 
Recognising the limitations of the technology, there is an increasing interest in combining 
real-time generated biological data with in silico predictions using a rational approach to 
integrating computational tools with the experimental setting (Jolivette & Ekins, 2007). 
Using in silico evidence to reduce or refine in vivo or in vitro experimentation can reduce 
such uncertainty if reliable and valid models are available. 


6.8 How to Assess Evidence from in Silico Methodologies? 


In product development and evaluation, in silico models of increased complexity are often 
used for similar applications as the ‘simpler’ pharmacometrics models, e.g., trial design 
optimisation, dose-finding/selection, extrapolation of drug efficacy and safety, etc. 

For this reason, considering that the model validation processes described in the pre- 
vious chapters have been carried out correctly, it is logical that requirements for their 
acceptability follow the same standards as those already established for models currently 
included in the regulatory dossier or parallel HTA requests. 

It is important to document these experiments thoroughly to ensure that the in sil- 
ico technology is properly evaluated. This documentation should allow independent 
evaluation by HTA bodies in the specific CoU of the technology. 

When evaluating these experiments, it is important to assess the reliability and rele- 
vance of the models used, particularly when the models could pose a risk to patients, 
involve complex systems, or when there is a considerable distance between the nature of 
the input (for example, chemical-physical parameters) and the nature or dimension of the 
output (health symptoms). 

Last but not least, HTA bodies might require to assess the credibility of the predictive 
model, as discussed in Chap. 4, as well as the quality of the software artefact imple- 
menting it, as described in Chap. 3. In particular, it is necessary to perform sensitivity 
analyses on those parameters with the highest uncertainty and which have a moderate or 
high impact on the results. In turn, granting access to the HTA bodies to the models and 
the data processing schemes would be useful to facilitate the assessment of the model in 
the specific contexts under evaluation. 
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6.9 Challenges for the Future 


As stated at the beginning, in silico technologies could also be a health technology or 
part of health technology, that is, Digital Patient or Digital Twin technologies. These 
are predictive computer models used as decision-support support systems by a clinician 
in treating an individual patient. From a regulatory point of view, these are considered 
"Software as a Medical Device" or *Medical Device Software". Still, in addition to the 
specific requirements for Software Quality Assurance (see Chap. 3), such medical devices 
should be certified for their model credibility (see Chap. 4). Computational modelling and 
simulation results might eventually be included in regulatory submissions. In that case, 
incorporating this predicted evidence needs to follow data/evidence generation, analysis, 
and reporting standards to enable the regulatory bodies (and HTA agencies) to assess the 
submitted material efficiently. 

In 2021, the Horizon Europe Framework Horizon program envisaged a line of action 
to provide regulatory agencies and HTA bodies with the necessary tools to exploit the 
potential of synthetic data!" for decision-making in the field of regulation and health 
technology assessment. One of the challenges of research in this area is determining 
the evidence value of this information source. Overall, there is a need for rigour and 
transparency in the methods used for in silico model development and validation, as well 
as their wider acceptance as a valuable source of evidence by the scientific community, 
including academic researchers, the pharmaceutical industry, regulatory bodies and HTA/ 
payers (Musuamba et al., 2021). A need exists for documenting the available tools, the 
manners they are being used, the conditions for their adequate use and the challenges 
encountered. The current hurdles for the wider acceptability of in silico models as a 
reliable source of evidence for high (HTA) impact applications in drug/medical devices 
development include: 


— ]ack of common standards and best practice documents commonly accepted by all 
relevant stakeholders, 

— the lack of important digital infrastructure to carry out the in silico methodologies 
(e.g., fast communication networks and high computing power and storage capacity) 
that could compromise the cost-effectiveness of the resulting health technologies and 
the coverage, reimbursement or pricing decisions by the public healthcare systems (Leo 
et al., 2022), 


10 Synthetic data is information that's artificially generated rather than directly captured by real- 
world events. Typically created using algorithms, synthetic data can be deployed to validate mathe- 
matical models and to train machine learning models (https://www.infog.com/articles/overcoming- 
privacy-challenges-synthetic-data/). 
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— the protection of individual citizens from harmful use, also due to security breaches, of 
their personal data. An approach to solving the challenge surrounding big health data 
sharing is the generation of synthetic data created from real data by adding statistically 
similar information, 

— biases in algorithm definition and poor training of analysts may pose risks to equity, 

— poor communication between stakeholders in that regard, 

— the deficit in the skills and knowledge essential to perform HTA based on in silico 
methodologies along the technology life cycle, and 

— relatively slow development of regulatory science and HTA as compared to commercial 
solution developments. 


Also, there is currently an unmet need for HTA guidance/best practice documents clearly 
describing standards for mechanistic in silico model development, evaluation and report- 
ing considering the specificities not only in their structure and the data sources for 
their construction and evaluation but also in the software and algorithms used for their 
implementation. 

Finally, further research is needed to understand the promises of the use ofin silico 
methodologies for the development and evaluation of health technologies, to improve their 
reliability, acceptance, and diffusion and to understand their expected impact on licensing 
and reimbursement decisions, as well as the role that HTA can have in the various phases 
of the application of in silico methodologies. 


6.10 Definitions of Various HTA Modalities 


Horizon scanning (Simpson and EuroScan International Network, 2014): this is the sys- 
tematic identification of new and emerging health technologies that have the potential to 
impact health, health services, and society; and which might be considered for an HTA. 
Identification can be: 


Proactive: where a range of sources are searched for information on new and emerging 
health technologies. 


Reactive: where systems are in place that allows stakeholders, health professionals, devel- 
opers and/or consumers to inform the Early Awareness and Alert (EAA) system on new 
and emerging health technologies 
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Pre-commercial procurement (PCP),!!:!*: public procurers can drive innovation from the 


demand side by acting as technologically demanding customers that buy the development 
and testing of new solutions from several competing suppliers in parallel to compare 
alternative solution approaches and identify the best value for money solutions that the 
market can deliver to address their needs. PCP consists of a procurement of Research & 
Development (R&D) services that involves risk—benefit sharing at market conditions and 
in which a number of companies develop in competition new solutions for mid-to-long- 
term public sector needs. The needs are so technologically demanding and in advance of 
what the market can offer that either no commercially stable solution exists yet, or existing 
solutions exhibit shortcomings which require new R&D. R&D is split into phases: solution 
design, prototyping, original development, and validation/testing of a limited set of first 
products. 


Early scientific advice (early dialogues)? (Ijzerman & Steuten, 2011; Tummers et al., 
2020): is a non-binding scientific advice, before the start of pivotal clinical trials (after 
feasibility/proof of concept study), in order to improve the quality and appropriateness 
of the data produced by the developers in view of future HTA assessment/re-assessment. 
Early HTA is increasingly being used to support health economic evidence development 
during early stages of clinical research. Such early models can be used to inform research 
and development about the design and management of new medical technologies to miti- 
gate the risks, perceived by industry and the public sector, associated with market access 
and reimbursement. 


Initial HTA: the early phase HTA helps technology owners or investors make evi- 
dence-informed decisions about further investment in the development of medical device 
and other health technologies, especially with expected public reimbursement or pro- 
curement. It attempts to provide appropriate value judgement and assessment of health 
financing scenarios of innovative technologies before moving ahead with the development 
process or investing in technology. 


Public procurement of innovative solutions (PPD)'^: PPI happens when the public health 
systems bodies and providers use their purchasing power to address their identified chal- 
lenges acting as early adopter of innovative solutions which are not yet available on large 


H https://digital-strategy.ec.europa.eu/en/policies/pre-commercial-procurement. 

1? (COM(2007) 799 final}, SEC(2007) 1668, COMMISSION STAFF WORKING DOCUMENT 
accompanying document to the COMMUNICATION FROM THE COMMISSION TO THE EURO- 
PEAN PARLIAMENT, THE COUNCIL, THE EUROPEAN ECONOMIC AND SOCIAL COM- 
MITTEE AND THE COMMITTEE OF THE REGIONS Pre-commercial Procurement: Driving 
innovation to ensure sustainable high quality public services in Europe Example of a possible 
approach for procuring R&D services applying risk-benefit sharing at market conditions, i.e. pre- 
commercial procurement, Brussels, 14.12.2007. 

13 https://www.eunethta.eu/ja3services/early-dialogues/. 

5 https://digital-strategy.ec.europa.eu/en/policies/ppi. 
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scale commercial basis, that are nearly or already in small quantity in the market and 
don't need new R&D. 


Mainstream HTA (Reuzel and Van Der Wilt, 2000): mainstream HTA entails scientific 
research into the effects and associated costs of health technologies and should support 
the decision-makers to decide on questions as ‘Is this technology better than the technol- 
ogy currently used?’, ‘How does it compare with alternatives in terms of effectiveness, 
appropriateness and cost of technologies?’ (see Sect. 6.3). 


Coverage and Reimbursement - policy? : in decision-making processes regarding the reim- 
bursement of medicines, it needs to be established whether a medicine should be 
considered eligible for reimbursement. Subsequently, if the medicine is classified as ‘reim- 
bursable', it needs to be assessed how much of the price the public payer should (or 
can) cover. Therefore, setting a price (pricing) and deciding on the level of coverage by 
public payers (reimbursement) are strongly interlinked. The assessment process usually 
includes criteria such as efficacy, effectiveness, safety, ease of use, and added therapeutic 
value, besides cost-effectiveness. In some European countries, the same decision-making 
process is now used for digital therapeutics.!^ 


Value-Based Public Procurement!’: public contract based on the value it generates across 
the whole healthcare provision chain (from the patients to the healthcare professionals, 


the healthcare providers and the payors). 


Re-assessment HTA !5: Re-assessment assesses changes that may occur in medical tech- 
nologies as they mature, as well as any new evidence available or other factors that can 
diminish past HTA findings and their utility for health care policies. As such, HTA can be 
more of an iterative process than a one-time analysis. Coverage and reimbursement poli- 
cies and subsequent value-based public procurement contracts shall consider the results 
of HTA reassessments. 


EL Bouvy, S. Vogler (2013) Update on 2004 Background Paper, BP 8.3 Pricing and Reimbursement 
Policies, WHO Collaborating Centre for Pharmaceutical Policy and Regulation https://ppri.goeg.at/ 
sites/ppri.goeg.at/files/inline-files/Bouvy PriorityMedicines 2013 BP8 3 pricing 2.pdf. 


16 Driving the digital transformation of Germany's healthcare system for the good of patients- 
Bundesgesundheitsministerium. 

17 Rossana Alessandrello, Ion Arrizabalaga Garde, Uxío Meis Pifieiro, Olman Alonso Elizondo 
Cordero, Maria Sanchis-Amat, Ramon Maspons (2021). Teoria del canvi, resultats neutrals respecte 
al tipus de necessitats no satisfetes en l'àmbit de la salut i permeabilitat de les compres pübliques 
d'innovació al valor. Annals de Medicina, vol 104, (2021). Barcelona: Académia de Ciéncies 
Mèdiques i de la Salut de Catalunya i de Balears. 

18 National Information Center on Health Services Research and Health Care Technology 
(NICHSR) (2014) HTA 101: Introduction to Health Technology Assessment. 
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6.11 Essential Good Simulation Practice Recommendations 
In silico methodologies can provide evidence to be used in HTA for: 


— demonstrating value to payers by predicting the real-life benefit and the optimal target 
population for drugs or medical devices 

— transposing Phase 3 trial results into virtual populations representative of specific 
geographies and context 

— Benchmarking competing health technologies by also considering the market access of 
new technologies and the achieved effectiveness in the real world. 
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Cécile F. Rousseau, Emmanuelle M. Voisin, Elisabetta Poluzzi, 
Alexandre Serigado, Marco Viceconti, and Maria Cristina Jori 


7.1 Introduction 


Before any experimental study can be conducted on humans, the study design must be 
approved by an independent body responsible for protecting the safety, well-being and 
rights of the human subjects involved in the experimentation. These bodies are called 
Independent Ethics Committees in Europe and Institutional Review Boards in the USA; 
hereinafter, we will use the acronym IEC/IRB to indicate them. 

Existing regulatory, legal and ethical frameworks for clinical trials were developed 
because of well-established medical research practices involving human subjects. Rules 
were set to protect human research subjects from hazards. By contrast, in silico medical 
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research relies on computational resources and data—using patient data to generate and 
validate computer models, which will be used to predict the necessary evidence. 


7.2 Short Overview of Ethical Review in Clinical Trials 


Research involving humans originated in a dark past, where human rights, safety and well- 
being were disregarded. And that past is not necessarily so remote (see, for example, the 
Tuskegee Study of Untreated Syphilis in the Negro Male!). With the progressive adoption 
of the Declaration of Helsinki and the establishment of the Good Clinical Practice? (GCP), 
Applicants/Sponsors and Investigators are required to ensure the proper conduct of the 
clinical trials. 

Ethical aspects of any clinical trials are ensured by IEC/IRBs. These entities, which 
are either local or central, aim to ensure the safety, rights, and well-being of all subjects, 
whether healthy volunteers or patients, enrolled in any clinical experimentation. Although 
rules were originally defined for trials on medicinal products,’ any trial involving experi- 
mental interventions (e.g., for surgical procedures or medical devices) must be submitted 
to IEC/IRBs before starting it. Also prospective and retrospective observational studies 
are submitted to IEC/IRBs to assess risks from additional diagnostic procedures, data 
protection, and the relevance of the research question. 

IEC/IRBs review clinical protocol and the corresponding amendments, the written 
information on aims, procedures, and rights to be provided to subjects, and the rele- 
vant written informed consent forms. They also oversee the enrolment process, including 
procedures, compensation payments (when appropriate), insurance coverage, the Investi- 
gator's qualifications, etc. IEC/IRBs are therefore involved before, during, and after the 
clinical trial. 


7.3 The Ethical Benefits of In Silico Methodologies 


In silico methodologies aim to refine, reduce, and replace experimental studies conducted 
in vitro, ex vivo, or in vivo on animals or humans and provide evidence on medical 
products’ safety, efficacy, and performance. 

If we focus on in silico methodologies aimed to refine, reduce, and replace human 
experimentation, several potential ethical benefits can be associated with these new 
technologies. 


1 Https://en.wikipedia.org/wiki/tuskegee_syphilis_study. 
2 https://www.ema.europa.eu/en/ich-e6-r2-good-clinical-practice. 


: https://health.ec.europa.eu/medicinal-products/clinical-trials/clinical-trials-regulation-eu-no-536 
2014 en. 
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7.3.4  Refinement 


Refining human experimentation means reducing the risks to which the enrolled sub- 
jects are exposed and increasing the benefit/risk ratio of the experimentation. This means 
maximising the regulatory utility of the information obtained by exposing the enrolled 
subjects to such risks. In silico methodologies have been proposed to stratify patients bet- 
ter, improving the inclusion and exclusion strategies. This may produce ethical benefits 
when it helps to identify subjects at higher risk of adverse effects. Where this does not bias 
the conclusions of the study, such patients can be excluded; alternatively, their identifica- 
tion allows the adoption of measures to mitigate the risk, such as additional monitoring. In 
some cases, in silico methodologies can also directly reduce the risk for enrolled subjects. 
For example, studies in cardiology that require an invasive fractional flow reserve (FFR) 
measurement can now be conducted using CT-based virtual FFR models that provide a 
non-invasive estimate of the FFR for each subject enrolled. 


7.3.2 Reduction 


When an in silico methodology can reduce the number of subjects who need to be 
enrolled, and thus the number of persons who are exposed to the risks that the study 
involves, this represents a direct ethical benefit, according to the 6" principle of the Dec- 
laration of Helsinki: “Jn medical research involving human subjects, the well-being of the 
individual research subject must take precedence over all other interests". The most obvi- 
ous examples are in silico-augmented clinical trials, where virtual and physical patients 
are combined (Haddad et al., 2017). Other examples are those cases where the primary 
outcome is not easily observable, and thus a surrogate biomarker is used to measure 
response or efficacy. In studies on new drugs to prevent fragility bone fractures caused 
by osteoporosis, areal bone mineral density is frequently used as a surrogate of the frac- 
ture endpoint. A CT-based digital twin can predict the absolute risk of fracture for each 
patient enrolled; because this predicted quantity has much higher discriminant power, the 
number of patients enrolled in the clinical studies to achieve statistical power is much 
smaller (Viceconti & Dall’ Ara, 2019). 


7.3.3 Replacement 


The complete replacement of human experimentation is currently not considered an 
option. However, there are several cases where human experimentation is impossible and 
others where a partial replacement might be an option. Human experimentation is impos- 
sible, for example, in assessing the MRI safety of implantable devices (e.g., heating of 
the device due to high-frequency electromagnetic pulses); here in silico methodologies 
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may provide evidence more reliable than animal experiments can provide (Baretta et al., 
2020). Scenarios of partial replacement are those where, for example, the digital twin of 
the subjects enrolled could be used to form a placebo arm in studies where the placebo is 
considered unethical. Also, in silico methodologies can reduce the numerosity of a clin- 
ical required to achieve statistical significance (Haddad et al., 2017). The last scenario 
where in silico methodology may introduce ethical benefits is in studies where, for sev- 
eral reasons, the necessary diversity (of ethnicity, gender, age, physical conditions, etc.) 
is difficult to achieve with the necessary statistical relevance. In silico-augmented clinical 
trials could be designed not to increase the study's statistical power but to make it more 
representative by including tailored virtual patients of such underrepresented sub-groups. 


7.4 The Ethical Review of Studies Involving In Silico 
Methodologies 


When assessing a medical product involves in silico methodologies, are there special 
attentions that the IEC/IRB need to have in their reviews? To be answered, this general 
question must be articulated into more specific questions. 

What is the role of the IEC/IRB when in silico methodologies are used to refine (i.e., 
to improve rather than to reduce or replace) human studies? We believe that the IEC/IRB 
is responsible for evaluating if a proper risk analysis has been conducted as part of the in 
silico methodology implementation and if this in silico approach reduces the risks for the 
subjects enrolled in the clinical trial or helped mitigate the adverse effects in case such 
risks materialise. In other words, the IEC/IRB needs to evaluate the ethical impact of in 
silico methodologies as they do for any other study methodology. However, this raises 
an issue of expertise in the current composition of IEC/IRB: such evaluation for in silico 
methodologies may require expertise rarely present in a typical IEC/IRB. In a time where 
studies involving in silico methodologies may still be a rarity, IEC/IRB may circumvent 
this problem by collecting, in such cases, the opinion of external experts to inform their 
own decisions. Still, it is reasonable to expect the inclusion of technology experts in IEC/ 
IRB in the long run. Submissions to the IEC/IRB should be extended to include also the 
technical information necessary to evaluate such in silico methodologies. 

If in silico methodologies are used to reduce the number of subjects enrolled in human 
studies, we do not see any significant change in how the IEC/IRB operates. In this case, 
all the concerns are on the reliability of a study's evidence, which concerns the regulatory 
bodies, not the IEC/IRB. Any means that can reduce the number of subjects enrolled 
without impacting the statistical relevance of the study should be seen positively from an 
ethical point of view. 

The case where in silico methodologies replace human experimentation is the most 
complex. 
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The first question is: when a study involves only in silico methodologies (for example, 
in full replacement scenarios), is the IEC/IRB review necessary, considering no human 
subjects are involved in the study? We believe the answer is no, with one notable excep- 
tion. IEC/IRBs ensure the safety of human subjects involved in the study; if no human 
subject is involved, there is no need for the IEC/IRB review. The only exception is when 
we need to use clinical data to design, inform, or validate the in silico methodology. 
In this case, the IEC/IRB review is required to ensure that the patient's data is treated 
according to the laws and the ethical principles that regulate these aspects. Frequently the 
clinical data to be used in the modelling activities are not collected on purpose; this poses 
the complex issue of re-using clinical data collected for clinical purposes or for research 
purposes different from the scope of the current study and whether an additional informed 
consent of the patients originally involved may be required. Because of its importance, 
this topic is discussed in greater detail below in a dedicated section. 

But as we explained before, the replacement is only partial in some cases. And this 
frequently occurs when a portion of the study poses ethical problems (e.g., placebo, chil- 
dren, rare diseases). We suggest that such studies should first be subject to a regulatory 
advice procedure. The regulatory opinion on the appropriateness of the study design,* 
including the partial replacement of some human experimentation using in silico method- 
ologies, should be acquired by the IEC/IRB, which would focus its evaluation of the 
ethical implications of the specific implementation of the study design, relying on the 
regulatory opinion for what matter the reliability of the evidence such study will produce. 
However, in this case, as in the previous one, the ethical evaluation will be difficult with- 
out the involvement of some technology experts. What we wrote before for the refinement 
scenario is also valid here: while initially, the IEC/IRB may rely on the opinions of exter- 
nal experts, in the long run, it is reasonable to expect the inclusion of technology experts 
in IEC/IRB. 


7.5 Data Protection 


With real-world data increasing, it is tempting to use them to build and validate compu- 
tational models. In addition, digital twins in healthcare are informed by the clinical data 
of individual patients. For such applications, developers must account for data protection 
laws such as, for example, the European General Data Protection Regulation? (GDPR) or 
the USA Health Insurance Portability and Accountability Act? (HIPAA). 


^ As explained in Chap. 6, the regulatory pathways for in silico methodologies are only partially 
defined and tend to differ between USA and Europe. 


5 Https://gdpr.eu/. 
6 https://aspe.hhs.gov/reports/health-insurance-portability-accountability-act- 1996. 
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An additional complexity for European developers is that the GDPR did acknowledge 
that the secondary use of clinical data for research purposes could justify some deroga- 
tion but made no detailed provisions, leaving the member states to define the specific 
legislation. This has led to a very complex situation, where each country member of the 
European Union has different legislation. The main problem is not that of privacy (in 
most cases, the clinical data used in in silico methodologies are irreversibly anonymised) 
but rather that of data ownership. The European GDPR states clearly that the clinical data 
are owned by the patient, and the clinical institution where the data were generated is 
allowed to treat this data only for the necessary provision of care. Any secondary use 
must be explicitly authorised by the patient, the data owner, with informed consent. The 
point of debate is the granularity of such consent. The orientation of some privacy author- 
ities in EU member states is that consent is given for each research project; thus, if the 
Investigator plans to reuse the clinical data for another research, he or she needs to collect 
new informed consent from each patient. 

The recent EU Data Governance Act promises to solve this problem. This new EU- 
wide regulation, which will enter into force in September 2023, provides rules and 
safeguards to facilitate the re-use of data whenever possible. The main mechanism is 
that of data altruism. Data altruism is about individuals and companies giving their con- 
sent or permission to make available data that they generate—voluntarily and without 
reward—to be used in the public interest. 


7.6 Credibility Assessment in the IEC/IRB Review 


In most cases, the IEC/IRB is not called to directly evaluate the evidence of the credibility 
of the in silico methodologies. When the study results are to be used as part of a regulatory 
submission for marketing authorisation, it is usually expected that before using an in silico 
methodology for a specific context of use, a qualification opinion on such use needs to 
be obtained by a regulatory agency. In such a case, the qualification opinion should be 
attached to the IEC/IRB submission. It should be noted that while in the USA, the FDA 
can provide pathways for the qualification of in silico methodologies for medical devices 
and drugs development tools, in the EU, such qualification pathway is available only for 
drug development tools. 

However, it could be a good practice to include any evidence of credibility available 
in the IEC/IRB submission. For example, if the credibility of the in silico methodology 
has been assessed using the ASME V V-40:2018 technical standards, the result summary 
of this assessment should be included in the submission. 
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7.7 Essential Good Simulation Practice Recommendations 
In silico methodologies offer several potential ethical benefits: 


— Refining human experimentation means reducing the risks to which the enrolled sub- 
jects are exposed but also increasing the benefit/risk ratio of the experimentation, 
maximising the regulatory utility of the information obtained by exposing the enrolled 
subjects to such risks. 

— When an in silico methodology can reduce the number of subjects who need to be 
enrolled, and thus the number of persons exposed to the study's risks, this represents 
a direct ethical benefit. 

— In silico methodologies can provide an ethical alternative where human experimenta- 
tion is unethical. 

— In silico methodologies can help in including in clinical studies the necessary diversity 
(e.g., of ethnicity, gender, age, physical conditions) that, for any reason, might be 
difficult to achieve experimentally. 

— [EC/IRB should evaluate the ethical impact of in silico methodologies as they do for 
any other study methodology. With two special cases, both related to its use to replace 
human experimentation: 


o For studies where the in silico methodologies are used to partially replace human 
experimentation, the ethical review of the study by the IEC/IRB is necessary. Still, it 
should be based on the regulatory qualification opinion on the in silico methodology. 

o On the contrary, for studies that involve only in silico methodologies and no human 
experimentation, the IEC/IRB review is not necessary, with the notable exception of 
the ethical management of clinical data to design, inform, or validate the in silico 
methodology. 


— To properly assess the ethical implications of in silico methodologies, IEC/IRB also 
need technical expertise. Initially, the IEC/IRB may rely on the opinions of external 
experts. Still, in the long run, it is reasonable to expect the inclusion of technology 
experts in the IEC/IRB. 
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The Sponsor 


Maria Cristina Jori, Roberta Bursi, and Marco Viceconti 


8.1 Introduction 


For the purpose of this document, we define Sponsor as “an individual, company, institu- 
tion, or organisation that decides to use computer simulations in a preclinical or clinical 
trial, aimed to a regulatory or decision-making purpose, conducted at any point in a 
product’s lifecycle, both prior to and following marketing authorisation.” 

According to ICH E6 (R2),! the Sponsor is “An individual, company, institution, or 
organisation which takes responsibility for the initiation, management, and/or financing 
of a clinical trial”. A superimposable definition is given in the standard ISO 14155:2020. 
Both definitions are confined to the concept of Sponsor in the context of human clinical 
trials. 

As explained in the previous chapters, in silico methodologies can refine, reduce, or 
entirely replace human experimentation. This chapter focuses mainly on studies where 
in silico methodologies are used to refine or reduce human experimentation; in other 
words, studies that still involve humans. However, the Sponsor’s basic responsibilities 


1 https://www.ema.europa.eu/documents/scientific-guideline/ich-E-6-r2-guideline-good-clinical- 
practice-step-5 en.pdf. 

2 https://www.iso.org/standard/7 1690.html. 
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are applicable in all contexts, particularly regarding the requirements of implementing a 
thorough critical-to-quality risk assessment process and to assure the reliability of results. 
Whatever the aim of the trial and its place in the development path of medical treatment, 
the Sponsor has an obligation to follow the fundamental principles of Good Clinical 
Practice (GCP) and/or Good Laboratory Practice (GLP) and/or other GxP, beyond and 
above the need to follow the present Good Simulation Practice. 

In the context of this chapter, when referring to computer simulations or in silico 
trials, we imply the use of models developed and validated according to the requirements 
covered in Chaps. 3 and 4. We also imply that fulfilling all regulatory requirements and 
guidelines applicable to preclinical and clinical trials related to medicinal products or 


medical devices is ensured. We will therefore focus on additional requirements to be 
followed when including computer simulations/in silico trials in the development process 
of new medical treatments. 

The Sponsor willing to include in silico trials in the frame of the pre-clinical and/or 
clinical development of a new medical treatment should: 


— extensively assess and clearly define the context of use of the in silico trial in the 
development path of its product; 

— allocate a project manager and adequate resources; 

— identify the computer simulations provider (internal/external); 

— draft the trial’s protocol; 

— analyse regulatory constraints and where necessary seek advice from regulatory 
authorities; 

— ensure continuous oversight of the project; 

— critically evaluate the study's outcome and discuss the results with the regulatory 
authority. 


8.2 Relevant Expertise 


The Sponsor may have internal technical resources/computational specialists or depend 
on computer simulations vendors/consultants. In any case, the Sponsor should have inter- 
nal personnel knowledgeable about computer modelling and simulations, at least to the 
extent needed for adequately assessing technical, regulatory, and logistic constraints. It is 
recommended that Sponsors with no prior experience in using in silico trials put in place 
a specific implementation plan, including basic training of personnel (e.g., attendance to 
specific courses, learning of available guidelines and documents, “hands-on” training) or 
refer to a specialised consultant. 

In particular, the Sponsor of an in silico clinical trial, whether intended to refine, 
reduce, or replace human experimentation, should have adequately trained personnel capa- 
ble of performing the necessary credibility assessment for the in silico methodologies, 
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follow available international guidelines and ensure that a quality management system 
throughout all trial stages is in place. 

Adherence by the Sponsor to specific training certification programs would be advis- 
able although of limited feasibility in practice due to the broad range of skills and 
experience required by in silico methodologies. To ensure nurturing of these in silico skills 
and experience, higher education institutions should revise their programs to include ele- 
ments of in silico medicine related to human health in all scientific degrees. They should 
also consider more specialised profiles that currently do not exist. The academic experts 
in in silico medicine should collaborate toward defining such curricula. 


8.3 Quality Management, Quality Assurance and Quality Control 


The Sponsor of an in silico trial should implement a critical-to-quality risk assessment 
process to ensure: 


— the protection of the rights, safety, and well-being of study participants (when these 
are involved), 

— the generation of reliable and meaningful results, and 

— the appropriate management of risk factors using a risk-proportionate approach. 


8.3.1 Risk Identification, Evaluation, Control, Communication, Review, 
and Reporting 


A basic set of factors relevant to ensuring trial quality should be identified for each study, 
focusing on critical factors. Examples of possible critical factors are: 


— Protocol development: the trial protocol should be scientifically sound and adequately 
sized, with well-defined and relevant endpoints and statistical methods. Study proce- 
dures and conditions for premature study interruption should be detailed. For hybrid 
studies, measures to protect study participants' rights, safety, and well-being should 
be defined, in addition to unambiguous identification of stopping rules for adaptive 
studies. Studies should also follow the respective good practice documents for the 
modalities other than in silico (e.g. GCP). 

— Selection of the clinical Investigators, as discussed in Sect. 8.6. 

— Selection of the modeller, as discussed in Chap. 9. 

— Trial monitoring/supervision, as discussed in Chap. 9. 

— Training of personnel: internal, CRO, and local study staff. 

— Data collection and analysis. 

— Data interpretation and reporting. 
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Once identified, the risks should be evaluated regarding the likelihood of occurrence, the 
extent to which those errors would be detectable and their impact (risk evaluation). Factors 
identified as critical to quality should be carefully evaluated in advance, and appropriate 
risk-mitigation activities should be put in place (risk control); in hybrid studies, these 
should be proportional to the impact of such factors on human subject protection and on 
the reliability of trial results. Quality management activities and periodic revision and re- 
assessment of critical factors should be documented. Any change to trial conduct deriving 
from corrective measures to mitigate critical risks should be documented and reported. 

For pilot trials, an external, independent Data Safety Management Board is recom- 
mended to set up that periodically reviews data as they accumulate. Studies with adaptive 
features and/or interim decision points need specific attention during proactive planning, 
ongoing review of critical quality factors, and risk management. 


8.3.2 Standard Operating Procedures 


The Sponsor should have in place a quality manual and written standard operating 
procedures (SOPs) to ensure that: 


— roles and responsibilities of the personnel (internal/external) are clearly defined and 
communicated; 

— the trial is carried out in compliance with the protocol and applicable regulations. Any 
deviation from the original plan is recorded, appropriately documented and justified, 
and its impact on the reliability of the results is properly assessed; 

— data generation, data collection, data handling, analysis and reporting are accurately 
managed to ensure data integrity and reproducibility; 

— the process of quality management is defined; 

— the process of vendor selection is defined. 


8.4 Contract Research Organisation (CRO) 


The ICH E6(R2) defines a CRO as “a person or an organisation (commercial, academic, or 
other) contracted by the sponsor to perform one or more of a sponsor's trial-related duties 
and functions". As previously discussed for the definition of Sponsor, in this chapter, the 
focus will be on the role of a CRO in the context of human clinical trials. Nonetheless, 
most of the topics discussed here are general and applicable also to CRO managing pre- 
clinical trials. 
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8.4.1 Relevant Expertise 


The use of CM&S in clinical studies will require a change in the current status quo of 
how CROs operate in drug development projects. 

CROs offering services for managing projects that include the adoption of CM&S 
in any context (like patient-specific models, virtual populations, hybrid trials, in-silico 
augmented clinical trials, etc.) should have adequate internal staff (highly preferable) or 
consultants with a good understanding of CM&S, in addition to the expertise in the man- 
agement of clinical trials. The role of the CRO may or may not include that of developing 
and running the models. Whenever the CRO also provides computer simulation services, 
relevant expertise and qualifications, as detailed in Chap. 9, must be ensured. 

When the CRO does not have an internal technical department with computational 
specialists, it might support the Sponsor in identifying the third-party vendor if required 
by the Sponsor. In all cases, the CRO should have a deep knowledge of applicable regula- 
tions, guidelines and best practices related to in silico trials and should remain constantly 
updated as knowledge in the scientific and regulatory fields progresses. 

Given the complexity of in silico clinical trials, it would be advisable that a specialised 
professional figure be dedicated to this type of study. 


8.4.2 Allocation of Roles and Responsibilities 


The Sponsor may transfer some or all its responsibilities to a CRO but the ultimate respon- 
sibility for the quality, and the integrity of the trial remains with the Sponsor. When 
delegating activities, including in silico activities, the Sponsor’s role is to provide the 
so-called Investigator’s Brochure (see Chap. 9) to the mandated Investigator. 
The allocation of responsibilities must be in writing, usually in the form of a contract. 
The Sponsor is also responsible for overseeing the activities performed by the CRO. 
Delegated activities may be related to: 


— trial design, 

— assessment of project feasibility and centres identification, 
— model building and development, 

— regulatory activities, 

— set-up of data collection tools, 

— sites initiation and training, 

— supervision of the trial conduct (simulations or in human studies), 
— site monitoring, 

— safety monitoring, 

— data handling and data privacy, 

— data analysis and reporting, 

— maintenance of trial documents. 
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In addition, when the CRO provides computer simulation services, all responsibilities 
detailed in Chap. 9 must be fulfilled. 


8.5 Adoption of Computer Simulations in the Definition 
of the Global Development Plan 


8.5.1  Pre-clinical Development Plan 


Computer simulation services required to support preclinical studies should be adequately 
described in a plan, including a description of the in silico trial objectives, available 
knowledge and data, modelling and simulation methodology to be applied, and outcomes 
evaluation criteria. If the services would be part of application submission to regula- 
tory bodies, Computer simulation activities, including reporting, should be performed 
according to the recommendations described in Chaps. 3 and 4. 

CM&S activities should be integral to the sponsor's strategic preclinical development 
program for the medical product under consideration. 


8.5.2 Clinical Development Plan 


Computer simulation services required to support clinical development studies should be 
performed in line with the recommendations provided in ICH E9 Statistical Principle 
for Clinical Trials,*-+ Specific regulatory guidance documents should be consulted and 
followed when including model-informed drug development approaches.? 

Modelling activities aiming to analyse the data obtained from a clinical trial should 
be described in a specific plan, including a description of the objectives, modelling and 
simulation methodology to be applied, and outcomes evaluation criteria. The in silico 
trial plan should be finalised before the start of the trial. In silico trials should be integral 
to the sponsor's strategic clinical development program for the medical product under 
consideration. 


3 https://www.ema.europa.eu/documents/scientific-guideline/ich-E-9-statistical-principles-clinical- 

trials-step-5_en.pdf. 

s https://www.ema.europa.eu/documents/scientific-guideline/ich-e9-r1-addendum-estimands-sensit 

ivity-analysis-clinical-trials-guideline-statistical-principles en.pdf. 

5 Madabushi, R., Seo, P., Zhao, L. et al. Review: Role of Model-Informed Drug Development 
Approaches in the Lifecycle of Drug Development and Regulatory Decision-Making. Pharm Res 
39, 1669—1680 (2022). https://doi.org/10.1007/s11095-022-03288-w. 
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8.6 Investigator Selection 


In the context of this chapter, the Investigator is “a person responsible for the conduct 
of the clinical trial/clinical investigation at a trial site", as defined in the ICH E6(R2) 
and ISO 14155:2020. Here again, we differentiate the clinical Investigator (i.e., a non- 
computational specialist) from the modeller, which is discussed in Chap. 9. Although 
the role and responsibility of the clinical Investigator and the modeller are conceivably 
different, in the context of hybrid or adaptive clinical trials, the interplay between the 
two "Investigators" is crucial. There is so far limited experience with the inclusion of 
computer simulations in the context of clinical trials in human subjects. As experience is 
accumulated, such interplay will be formalised. 


8.6. General Requirements 


The selection process of a clinical Investigator should consider the context of the use of 
the in silico trial (whether to reduce, refine, or partially replace clinical experiments), the 
specificities and the complexity of the trial design, and follow a preliminary careful risk 
evaluation process. In particular, the selection of a clinical Investigator must take into 
consideration the role and the actual involvement of the Investigator: 


— The clinical Investigator is involved in human clinical trials run to validate predictive 
models, 

— The clinical Investigator is involved in a clinical trial simulation (e.g., use of synthetic 
control arm, virtual populations, digital twins), to inform or to complement the clinical 
trial, 

— The clinical Investigator participates in a hybrid in silico/in human trials. 


Although a general understanding of modelling and simulation technologies is required 
in all cases, the level of knowledge in computer simulations the Investigator has should 
be proportional to the risk: the higher is the risk (which can be quantified with a risks 
analysis such as the one part of the ASME VV-40:2018 standard), the more qualified 
should be the Investigator. 

Similar considerations apply to Investigator selection in the context of preclinical 
development. 


8.6.2 Investigational Centre Selection 


Based on its role and involvement, the Investigator selection process—in addition to the 
verification of the requirements established in the ICH E6(R2) and in the ISO 14155:2020 
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for centre selection—may require the need to perform additional verifications to ensure 
that the centre has adequate facilities for the in silico aspects. It is also important to secure 
that the Institution and the competent Independent Ethics Committee/Institutional Review 
Board are well-informed and involved in the process, particularly in the case of complex 
trial designs. 


8.7 Study Design, Setup, and Management 


The scope of this section is not to analyse and discuss the different possible designs of 
an in silico trial in the development of a medical product but to provide general guidance 
and overarching principles. 

An in silico trial design should align with the Clinical Development Plan established 
for that medical product and be preliminarily submitted for advice to regulatory author- 
ities. The regulatory pathway chosen depends on the clinical development plan and the 
proposed use of the data generated from the in silico trial. 

The Sponsor should provide an updated Investigator's Brochure detailing all available 
information related to the medical product, including, in the case, results of performed 
CM&s. 

A study-specific protocol with clearly defined endpoints, a rigorously described 
methodology, and a proper statistical section must be in place. The rationale and the 
model’s aim (context of use) should be well described, and the level of the model 
risk, based on a risk-informed credibility assessment of the computational model. It is 
recommended that the clinical Investigators are involved in designing the protocol and 
definitions of the endpoints to ensure that clinical endpoints and engineering outputs are 
well aligned. The clinical Investigator should also be consulted in preparing the patient’s 
information leaflet and informed consent form, if applicable. 

Before the start of the study, ethical and regulatory approvals—as appropriate—are to 
be obtained. Written agreements among all involved parties (e.g., sponsor, Investigators, 
institutions, CRO) defining the responsibilities of each party shall be in place. 

The general guidelines set in the ICH E6(R2) and in the ISO 14155:2020 should be fol- 
lowed for the study setup, including maintenance of study documents and documentation, 
the conduct of the study initiation visits and the training of site personnel. The extent of 
training on computational models for the study site personnel will be customised depend- 
ing on the specific involvement of the Investigators; in hybrid or adaptive clinical trials, 
there should be an ongoing interaction between the modeller and the clinical investigator. 

The Sponsor should define in a targeted monitoring plan the extent and nature of 
monitoring appropriate for the study based on risk assessment (see Sect. 8.9). 
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8.8 Data Handling and Record Keeping 


The Sponsor should utilise appropriately qualified (internal or external) individuals to 
handle and verify the data, conduct the computer simulations analyses, and prepare the 
trial reports. 

For electronic data handling and/or remote electronic trial data systems, the recom- 
mendations included in Sect. 8.5 of the ICH E6(R2) should be followed. 

The sponsor should retain all sponsor-specific essential documents in conformance 
with the applicable regulatory requirement(s) of the country(ies) where the product is 
approved and/or where the sponsor intends to apply for approval(s). 


8.9 Compliant GxP Computerised Systems 


GxP is an umbrella term that describes regulatory guidelines across the pharmaceutical and 
medical device industries. The term encompasses a variety of regulatory guidelines such as 
Good Laboratory Practice (GLPs), Good Clinical Practices (GLPs), Good Manufacturing 
Practices (GMPs), Good Distribution Practices (GDPs), and Good Storage Practices (GSP). 

GxP compliance is establishing and documenting that the specified GxP requirements 
of a computerised system can be consistently fulfilled. Validation should ensure accuracy, 
reliability, and consistent intended performance from design until decommissioning of the 
System or transition to a new system. 

Digital systems used for trial purposes should consider the factors critical to their 
quality in their design and be fit for purpose. To this end, validation of systems, data pro- 
tection, information technology (IT) security and user management are essential elements 
to be addressed. 

Sponsors should maintain Standard Operating Procedures (SOPs) for using these sys- 
tems. SOPs should cover system setup, installation, and use. They should further describe 
system validation and functionality testing, data collection and handling, system main- 
tenance, system security measures, change control, data backup, recovery, contingency 
planning, and decommissioning. The responsibilities of the Sponsor, Investigator, and 
other parties concerning the use of these computerised systems should be clear, and the 
users should be provided with training in the use of the systems. 

Sponsors should further ensure the integrity of the data, including any data that 
describes the data's context, content, and structure. This is particularly important when 
changing computerised systems, such as software upgrades or data migration. 

The Sponsor may transfer responsibilities of a computerised system to a Technology 
Service Provider. Still, the ultimate responsibility for the quality and integrity of the com- 
puterised system remains with the Sponsor. The allocation of responsibilities must be in 
writing, usually in the form of a contract. The Sponsor is also responsible for checking 
that the SOPs of the Technology Service Provider are meeting the Sponsor's quality and 
integrity standards and overseeing its activities. 
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8.10 Monitoring Procedures 


The role of a monitor in the frame of an in silico trial is so far not established. We assume 
that while a monitor has no role in the verification of the technical aspects of the model, 
he/she may be involved in ensuring that: 


— adequate documentation is produced and maintained during the running of the in silico 
trial, 

— the data used for the models can be tracked to the source, 

— the data used for the models are accurate and complete, 

— proper informed consent has been obtained from data subjects, where applicable. 


Depending on the type of in silico trial, these activities should complement standard 
monitoring activities performed for clinical trials according to current guidelines and 
regulations to which reference is made. 

In all cases, the Sponsor (or delegate) must develop a risk-based monitoring plan 
based on the risk assessment and tailored to the type and complexity of the study 
(pre- vs post-market) and its regulatory purpose. In addition to on-site monitoring, cen- 
tralised monitoring (i.e., a remote evaluation of accumulating data) should be implemented 
extensively to ensure data quality. 

The outcome of all monitoring activities must be documented in the form of reports, 
which must be timely provided to the Sponsor for review and follow-up. 

A special case is when the results of the double-blind clinical experimentation are also 
to be used to validate the predictive model. In such cases, the clinical data collected during 
the study have a double use: they inform the safety/efficacy of the new intervention being 
tested and validate a predictive model. These two activities have different requirements: 
the analysis to assess safety or efficacy usually takes place once the study is finished, 
whereas the validation of predictive models may require some of the data (those used as 
input for the model) to be disclosed to the modeller as soon as they are collected so that 
the prediction can be made before the validation data are collected (which minimise the 
risk of bias). This creates a potential issue for the Sponsor, who should be asked to open 
the labels to the modeller while the trial is still running. A possible solution is this: 


— Patients’ assignment to study treatments is labelled as groups A and B. The key is 
disclosed to the modeller only, who is independent of the study team and bound to 
secrecy; 


or 


— the input data are identified and stored separated from the rest of the clinical data; 
— the modeller is given access to this subset of the clinical data, but no label information; 
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— the modeller runs the simulations for each patient enrolled twice, once assuming 
the patient has been treated and one assuming the patient was given the placebo/ 
comparator; 

— once the study is completed and the labels are opened, the right simulation is chosen for 
each patient and compared to the clinically observed values to complete the validation 
study. 


8.11 Audit 


One of the critical responsibilities of a Sponsor is to ensure oversight of any clini- 
cal trial-related duties and functions, including oversight of the external organisations 
to which some activities have been delegated (ICH Q10, 21 CFR 211, 21 CFR Part 
820.50). The Sponsor should redact an audit plan tailored to the level of risk, focused 
on critical-to-quality aspects identified in the risk assessment process. Appointed auditors 
must be independent of the Sponsor and qualified by documented training and experience 
to conduct audits. 

In particular, when computer simulations are outsourced to external vendors, auditors 
should have the technical expertise to verify critical aspects such as version con- 
trol for models and software, adherence to standards, and maintenance of adequate 
documentation. 

All findings will be reported to the Sponsor in an audit report to be shared with the 
audited party. A corrective and preventive action (CAPA) plan should be implemented 
and followed up for relevant findings. 


8.12 Non-compliance 


Non-compliance with the protocol, procedures, and regulations can be detected during 
monitoring or may be a finding from an audit. The Sponsor is responsible for assessing 
the relevance of the non-compliance and implementing proper corrective actions or ter- 
minating the participation of a site/Investigator in the case of serious and /or repeated 
non-compliance, notifying the regulatory authorities when required by the regulation. 


8.13 Premature Termination or Suspension of a Trial 


The handling of a premature end of a study involging in silico methodologies is quite 
similar to that used in conventional clinical studies. 
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The possible reasons for premature termination or suspension of a hybrid/adaptive/in 
silico-augmented clinical trial should be described in the risk management plan and in the 
study protocol. If appointed, the independent Data Safety Management Board should be 
involved in evaluating potentially critical factors. Suppose a decision is made to terminate 
or suspend a trial, the Investigators and institution. In that case, regulatory authorities 
and Ethics Committees should be promptly informed and provided the reason(s) for the 
termination/suspension. The reasons for the premature termination/suspension of an in 
silico trial not directly involving human subjects should also be documented. 

There are fewer reasons for in silico trials to terminate early than clinical trials. 
Nevertheless, this could happen when: 


— In an in silico-augmented trial, the experimental observations made on the physical 
subjects enrolled in the study are not consistent with the predictions made for the 
virtual subjects. 

— It becomes clear that the envisioned potential cannot be demonstrated based on an 
interim analysis. 

— The sponsor terminates support and funding based on respective clauses in the 
agreements. 

— The simulation software is not supported anymore by the developers/vendors, and 
issues or incompatibilities come up that do not allow completing the trial with 
the existing version. Considering such a scenario, risk should be minimised during 
model selection/development (see Chap. 3) but cannot be ruled out completely (e.g., 
bankruptcy). 


One of the study arms demonstrates a clear benefit in an interim analysis. Ethically, this 
does not require trial termination as it could be completed without negative effects after 
publicising the initial results. Nevertheless, the sponsor could decide to terminate for 
economic reasons if the intended benefit has been demonstrated already. 

Any premature trial termination requires detailed documentation regarding the reasons 
and circumstances and data acquired and analysed in a dedicated report. 


8.14  Trial/study Reports 


Whether the trial is completed or prematurely terminated, the sponsor should ensure that 
the trial reports are prepared and, if applicable, provided to the regulatory agency(ies). The 
sponsor should also ensure that the clinical trial reports are adequately in line with the 
standards of the ICH E3 Guideline for Structure and Content of Clinical Study Reports? 
and model credibility assessment recommendations provided in Chap. 4. 


6 https://www.ema.europa.eu/documents/scientific-guideline/ich-E-3-structure-content-clinical- 
study-reports-step-5_en.pdf. 
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8.15 Essential Good Simulation Practice Recommendations 


— The Sponsor of an in silico clinical trial, as well as the CRO that manages it, should 
have in staff the necessary technical expertise. 

— Computer modelling and simulation services required to support clinical development 
studies should be performed as per the recommendations provided in ICH E9 Statistical 
Principle for Clinical Trials and be in line with existing regulatory guidelines on the 
use of CM&S in drug/medical device development plan. 

— All computerised systems used in in silico clinical studies should be GxP-compliant. 
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9.1 Roles and Responsibilities 


In analogy to what is recommended by the ICH guideline on GCP for regular clinical 
trials, the roles, tasks, and responsibilities of the parties conducting a study involving in 
silico methodologies should be clearly stated and documented appropriately. 1? 

In a clinical study, the Investigator is the person who runs the study. The Investigator 
may help prepare and carry out the study's protocol (plan), monitor the study's safety, 
collect and analyse the data, and report the study's results. 

When in silico methodologies are involved, the term Investigator refers to the per- 
son, or in some cases the hosting institution, in charge of carrying out the modelling 
tasks and generating the in silico evidence. Experts who develop predictive models are 


T https://database.ich.org/sites/default/files/E6_R2_Addendum.pdf. 
2 https://database.ich.org/sites/default/files/ICH_E6-R3_GCP-Principles_Draft_2021_0419.pdf. 
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usually referred to as modellers, whereas experts who merely use models developed by 
others are sometimes called analysts. Here we will refer to both roles indistinctly with the 
term Investigator. Given that the role of the clinical Investigator and that of the in silico 
Investigator may involve different backgrounds in clinical studies where in silico method- 
ologies are involved, the two roles may be separated and assigned to different persons or 
institutions. 

The Investigator may be in charge of performing the simulations and analysis but also 
of activities described in the model development plan (c.f. Chap. 3) and the credibility- 
building activities (c.f. Chap. 4). The Investigator's role and responsibilities are defined 
in relation to the Sponsor and their mutual agreement: 


A documented agreement with the Sponsor should clarify the roles, responsibilities, 

and frequency of reports at the beginning of the project. 

— The Investigator should be aware of and comply with applicable modelling and 
simulation standards and guidelines, such as the ones listed in Annex A. 

— The Investigator/institution should have approval of the competent Institutional Ethics 
Committee and/or Review Board where required. The Investigator/institution and 
Sponsor share responsibility for the handling and protection of personal health data, 
together with the ethics committee. 

— The Investigator must follow the model development plan as agreed with the Sponsor. 
In case of deviation from plan, this should be discussed early on and agreed with 
the Sponsor in written form. If applicable, new approval and opinion from the ethics 
committee should be obtained (e.g., when the deviation regards personal health data 
acquisition, storage, or processing steps). 

— The Investigator is responsible for ensuring that the modelling and simulation activ- 
ities are carried out with adequate pre-defined hardware and software infrastructures 
for which the protocol and credibility assessment measures have been designed and 
approved (c.f. Chaps. 3 and 4). 

— Since part of the modelling activities may be delegated to third parties, it is the 

responsibility of the Investigator or of the Sponsor to record any tasks that have been 

delegated and the list of the qualified persons they were delegated to. In addition, the 

Investigator or Sponsor is responsible for adequately informing each third party assist- 

ing with the modelling and simulation process about the investigational product (see 

Investigator's brochure), the modelled system and agreed protocols. 
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9.2 Investigator's Brochure 


Similar to regular clinical trials, the Investigator must be informed by the Sponsor about 
the medical product under investigation and subjected to the in silico trial. This can take 
place through handing of an investigator's brochure by the Sponsor to the Investigator, 
like recommended in the ICH Good Clinical Practice? and the European Commission 
Directives 2005/28/EC^ and 2001/20/EC.? 

The Investigator's brochure summarises the medical product characteristics and com- 
piles existing clinical and non-clinical data (including pre-existing in silico data) about 
the medical products relevant to the study to facilitate understanding the rationale of the 
in silico trial (Dóerr et al., 2017). 

The information in the investigator's brochure shall be presented in a concise, sim- 
ple, objective, balanced, and non-promotional form that enables potential investigators to 
understand it and make an unbiased risk—benefit assessment of the appropriateness of the 
proposed in silico trial. 


9.3 Investigator's Qualifications 


The Investigator needs to be qualified to fulfil his/her role. The required competencies 
range from practical skills regarding the use of the simulation software ("know your 
tools") to the capacity to judge whether the model at hand is suitable for the specific 
Context of Use (CoU), as detailed below. A lack of general understanding of the phys- 
iological processes and the lack of interdisciplinarity in the team are important pitfalls 
in applied modelling and simulation. In particular, the Investigator needs the following 
qualifications: 


— Capacity to judge whether the in silico model technique and its boundaries (Intended 
Use) are compatible with the clinical purpose and objectives (CoU). This assessment 
requires that the Investigator has access to information on the biomedical context of 
the study and pre-clinical information about the medical product being modelled. In 
this context, the Investigator's brochure is of particular importance (see Sect. 9.2). 

— Capacity to evaluate the adequacy of the modelling decisions to be taken during the 
design and the execution of the in silico trial and their implications for the intended 
CoU. This assessment includes biomedical and numerical aspects (for example, time 
and space resolution, convergence, and stability). When the expertise of the modeller 


3 https://database.ich.org/sites/default/files/E6_R2_Addendum.pdf. 
7 http://eur-lex.europa.eu/Lex UriServ/LexUriServ.do?uri=CELEX:32005L0028:EN:NOT. 


5 https://health.ec.europa.eu/medicinal-products/clinical-trials/clinical-trials-directive-200120ec_ 
en. 
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on the pathophysiology of the biomedical process being simulated is not documented, 
an expert on the specific pathophysiology in question should also be consulted. 

— Proficiency in using the M&S software for the in silico trial. We refer to Chap. 3 for 
cases where the software needs to be adapted. 

— Capacity to post-process, analyse, and condense the results of the in silico trial, 
including statistical analysis. 

— Capacity to identify relevant ethical aspects related to the in silico trial. These can be 
evaluated by the Investigator or discussed with the institutional ethics committee if 
required. 


Formal training, degrees, and certificates will often be evidence for many of these com- 
petencies. However, there is no specific set of degrees of certificates that would be 
comprehensive enough to cover all aspects and, at the same time, general enough to 
be applied to all fields of in silico medicine and the wide range of possible CoUs. Con- 
sidering the wide range of required qualifications, one person is unlikely to fulfil all of 
them on an expert level. The Investigator needs to ensure that all required competencies 
are available in the team of experts involved in the study. 

The qualifications of the people involved in a simulation study may need to be reported. 
For instance, in the NASA-hdbk-7009a° about CM&S in mission-critical applications, it 
is requested to "provide an understanding of the education and experience of the people 
developing and using the M&S" in a dedicated table. 

In conclusion, the Investigator needs to convince the relevant stakeholders (Sponsor, 
regulatory agencies, ethics committee) that the relevant qualifications are available. 


9.4 Adequate Resources 


To exectute the in silico trial, or other modelling tasks agreed on with the Sponsor, accord- 
ing to the state of the art, the Investigator needs access to human resources, support, 
computing resources, and feedback. 

In most cases, human resources will be the most expensive part of the in silico activities 
budget. The Investigator and his/her team need to be funded adequately to be able to 
commit the required time to the execution of the in silico experiments and their analysis. 
The team needs to be formed with persons covering all the required qualifications as 
detailed in the previous section. 

For situations in which the expertise within the team is not sufficient or solutions can 
be obtained more efficiently with help from the outside, the Investigator should have 
access to external support. Demand for such support can arise in various fields as evident 
from the wide range of required qualifications (see sect. 9.3): for example, technical 
support from the developers/vendors of the simulation software, support for collecting 


6 https://standards.nasa.gov/standard/NAS A/NASA-HDBK-7009. 
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data, statistical support, support regarding ethical questions or legal and regulatory issues. 
Resources need to be allocated to pay for such support in case this is not covered by 
existing agreements. The participation of external experts must be properly documented 
and tracked throughout the study. 

To run the in silico experiments, the Investigator depends on adequate computing 
resources. These can range from a personal computer to high-performance computing 
resources in a dedicated computing centre or in the cloud, depending on the characteris- 
tics of the model and the number of simulations to be run. The specific requirements need 
to be discussed and agreed upon with the Sponsor in good time. Remote access needs to 
be secured according to the state of the art. The choice of the computational platform 
must also be made keeping in mind the legal and ethical requirements that the treatment 
of sensitive data imposes. 

To ensure that the results of the in silico activities will be as valuable as possible, the 
Investigator should have access to feedback from experts of the biomedical context and/ 
or “users” of the results (e.g., physicians, product managers, regulatory agencies, etc.) 
during the modelling process unless explicitly designed differently in the study protocol. 


9.5 Records and Reports 


The sequence of steps and decisions made during the modelling process are context- 
specific and may be subjective, impacting the conclusion and hindering the results' 
reproducibility (Erdemir et al., 2019). Therefore, concerning a quality approach and/ 
or regulatory evaluation, the M&S tasks and decisions must be documented and 
reported. Since the Investigator carries out these tasks, here we focus on his/her main 
responsibilities concerning recording and reporting. 

The Investigator must identify, justify, and document every expert-based choice poten- 
tially prone to modeller bias (e.g., parameter selection, model structure). All source 
documents, codes, results, and data should be adequately recorded, maintained, and 
retained by the Investigator/institution, with the support of the Sponsor, for the dura- 
tion initially agreed with the Sponsor. It is the responsibility of the Sponsor to agree 
in advance on an adequate period of time. The tasks that have been delegated should 
also be subject to recording. In addition, the Investigator must make all records available 
upon request of the Sponsor or relevant regulatory authorities. As such, the Investigator/ 
institution should ensure the adequate accessibility and legibility of documents and data 
and support audits. 

Regarding reporting activities, the Investigator should provide frequent written progress 
reports to the Sponsor as defined in the initial agreement. Those reports should document 
the technical progress and results, as well as potential model deficiencies, limitations, and 
ideas for improvements discovered during the process. Any deviation from the agreed 
protocol should also be justified and reported by the Investigator when they occur. 
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Finally, the Investigator must provide the Sponsor and regulatory authorities with a 
final report summarising the outcome of the in silico study after termination. This report 
should include the actual workflow employed by the Investigator, the generated in sil- 
ico evidence, and their analysis concerning the CoU. The Investigator is responsible for 
the scientific integrity of the reported research and data. The FDA has issued a guidance 
document providing modellers with a general outline for reporting computational mod- 
elling and simulations in medical device submissions. Although detailed content may not 
entirely apply to all types of in silico models (e.g., for drug approval submission), the gen- 
eral outline is rather generic. It may be considered for guiding the final report of in silico 
trials. In addition, the EMA provides guidelines to physiologically based pharmacokinetic 
modellers that describe the expected content of M&S reports for regulatory submissions.? 
Similar guidelines were also released for reporting population pharmacokinetic analyses.? 
Overall, the content is specific to drug applications, but many recommendations also apply 
to other modelling applications. 


9.6 Safety and Security 


In silico methodologies can be used to refine, reduce, or replace human experimentation. 

When in silico methodologies are used to reduce or replace human experimentation, 
given the digital nature of in silico trials, there is no direct involvement of human par- 
ticipants from which health-related safety issues could arise during the Investigator's 
modelling activities. The major risk for humans regards personal data, which must be 
handled according to data privacy standards and rules. 

Regarding simulation input measurements and validation activities, any clinical trial 
necessary to generate input data for the model is not the modeller's responsibility and 
should comply with other relevant guidelines, such as good clinical practices (GCPs). 
However, the Investigator must consider the safety and security aspects of in silico activ- 
ities. Data safety, i.e., protecting data against loss by ensuring safe storage and back-up 
of the data, must be ensured by the Investigator/institution. This means input (patient) 
data and codes, analyses, results, records, and reports. Therefore, appropriate hardware 
or cloud facilities with backup systems and version control should be available and used 
by the Investigator (c.f. Sect. 9.4). The Investigator should follow the data storage and 


? FDA, Reporting of Computational Modeling Studies in Medical Device Submissions - Guidance 
for Industry and Food and Drug Administration Staff. (2016). https://www.fda.gov/regulatory-inf 
ormation/search-fda-guidance-documents/reporting-computational-modeling-studies-medical-dev 
ice-submissions. 

3 https://www.ema.europa.eu/en/reporting-physiologically-based-pharmacokinetic-pbpk-modell 
ing-simulation. 

9 https://www.ema.europa.eu/en/reporting-results-population-pharmacokinetic-analyses. 
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version tracking strategy as defined in the model development plan (c.f. Chap. 3) with the 
Sponsor. 

Data security is also the responsibility of the Investigator/institution and/or the Sponsor, 
who should protect personal health data and patient privacy by ensuring adequate use and 
access restriction to the data. As such, the Investigator based in the European Union 
must comply with the current General Data Protection Regulation (GDPR)!° and related 
directives; most other countries now have similar legislations, although the details may 
vary considerably. It should be noted that if the country where the data were collected is 
subject to legislation different from that in force in the country where the data are being 
treated, the treatment of the data must follow the rules of the country where the data were 
collected. 

Specific attention must be paid to the level of data anonymisation and the possibil- 
ity of relating some of the data-derived model characteristics (e.g., organ geometry) to 
the patient's identity. The ethics committee will commonly evaluate the data security 
and privacy strategy and specific measures, which may require a full Data Protection 
Impact Assessment (DPIA). The Investigator should use patient data according to what 
was defined in the protocol and approved by the ethics committee. 

In the case of refinement of human experimentation, the in silico methodologies are 
used alongside the experimental ones, thus posing the same risks to the patients. A typ- 
ical example is a PKPD model used to calculate the next drug dose in an escalating 
dose-response study. In these cases, the study design should be reviewed by an ethical 
committee before it starts and also adhere to other relevant good practice documents like 
the Good Clinical Practice (GCP), as well as FDA and EMA guidelines on population 
PK and exposure-response modelling. The documentation should include evidence that 
the in silico methodology is fully qualified from a regulatory standpoint. 

Finally, any safety issues related to the use of the model within its intended CoU or 
that emerge as a result of the simulation and/or identified by the Investigator (c.f. Chap. 3) 
should be detailed in the report to the Sponsor and to the regulatory authorities. 


9.7 Essential Good Simulation Practice Recommendations 


— Role and responsibilities of the Investigator are defined in relation to the sponsor and 
their mutual agreement, which should be documented. 

— A record should be kept of eventual third parties contracted to assist in the CM&S 
activities and they should be adequately informed about the investigational product by 
the Investigator. 

— The Investigator needs to ensure and convince stakeholders that all relevant qualifica- 
tions are available in the team of experts involved in the study. 


10 https://eur-lex.europa.eu/eli/reg/2016/679/oj. 
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— The Investigator needs access to human resources, support, computing resources, and 
feedback necessary to accomplish the task as agreed with the Sponsor. 

— All source documents, codes, results, and data should be adequately version controlled, 
recorded, maintained, and retained by the Investigator/institution, with the Sponsor's 
support, for the duration initially agreed with the Sponsor. 

— The Investigator is responsible for providing regular and final written reports on the 
conduct of the study and its conclusions by following appropriate reporting guidance. 

— The Investigator and the Sponsor should implement proper data safety and security 
measures, complying with relevant regulations (GDPR, etc.). 
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Annex: A Review of the Existing Regulatory 
Guidance on the Use of Computational 
Models 


This position paper on Good Simulation Practice does not emerge from a vacuum. In the 
last 20 years regulatory agencies issued guidance documents or technical standards for 
specific uses of In Silico methodologies. Here, we provide a short systematic review of 
this existing body of knowledge. The goal is not to provide details of these documents; 
for that it easier to consult directly to the original document; but only to provide for each 
à brief summary, which allows practitioners to choose which of these documents might 
be relevant for one's purposes. 

The types of models and related documentation that are covered in this short review are 
related to drugs (quantitative structure-activity relationship (QSAR) models, population 
pharmacokinetic (Pop-PK) models, exposure—response models including the Comprehen- 
sive In Vitro Proarrhythmia Assay (CiPA) project, physiologically based pharmacokinetic 
(PBPK) models, and disease-drug-trial models) and to medical devices (alternative to 
animal testing for artificial pancreases, risk of mechanical failure, magnetic resonance 
imaging (MRI) safety, guidance on reporting of computational modelling studies, cred- 
ibility assessment, acknowledgement in EU documentation for market access, Japanese 
guidelines for in silico methodologies). 

It should be noted that these documents were written by different organisations, at 
different times, and with different purposes; thus, the language used is not consistent. 
However, we decided to preserve in our summaries the original texts to respect the 
integrity of the document. 


Drugs—Quantitative structure-activity relationship (QSAR) models 

Quantitative structure-activity relationship models (QSAR models) are classification or 
regression models. In drug discovery they are used to identify molecular structures 
with low non-specific activity and good inhibitory effects of specific targets; they are 
also used to estimate the octanol—water partition coefficient (logP), important informa- 
tion to evaluate how a substance behaves with respect to factors like bioavailability 
(druglikeness). 


© The Editor(s) (if applicable) and The Author(s) 2024 123 
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The only piece of regulatory guidance available for this type of model is the following: 


ENV/IM/MONO(2004)24. Report from the Expert Group on (Quantitative) Structure— 
Activity Relationships on the Principles for the Validation of (Q)SARs. Paris, France: 
Organization for Economic Co-operation and Development (OECD) Expert Group on 
QSARs. 
https://read.oecd.org/10.1787/9789264085442-en?format=pdf 

Quantitative structure—activity relationship (QSAR) models are regression or classifica- 
tion models used in the chemical and biological sciences and engineering. There are two 
types of QSAR models, regression or classification QSAR models. Like other regression 
models, QSAR regression models relate a set of “predictor” variables (X) to the response 
(potency) variable (Y), while classification QSAR models relate the predictor variables to 
a categorical value of the response variable. In QSAR modelling, the predictors consist of 
physicochemical properties or theoretical molecular descriptors of chemicals; the QSAR 
response-variable could be a biological activity/potency of the chemicals. QSAR models 
are first developed based on a dataset of chemicals to describe the relationship between 
chemical structures and biological activity. Then, QSAR models can be used to predict 
the activities of new chemicals. QSAR models can be as simple as a statistical regression, 
involve molecular dynamics calculations (e.g., 3D-QSAR based on binding affinity), or 
more complex and advanced models such as machine learning models. QSAR models 
rarely include a mechanistic model of the physiology beyond the molecular scale: they 
capture either the mechanistic chemistry of the drug action at the molecular scale or build 
phenomenological relations with clinical endpoints. 

The OECD Principles for Validation of (Q)SAR for regulatory consideration are: 


— adefined endpoint 

— an unambiguous algorithm 

— adefined domain of applicability 

— appropriate measures of goodness-of-fit, robustness and predictivity 
— a mechanistic interpretation, if possible 


Drugs—Pop-PK 
Pharmacokinetics (PK) investigates how a drug is absorbed, distributed, metabolized, and 
eliminated from the body. Population pharmacokinetics models (popPK) are informed 
by concentration-time data from multiple individuals frequently pooled across multiple 
studies, and are used to for allometric scaling, exposure-response, bioequivalence, and 
many uses. 

The pieces of regulatory guidance available for this type of models are: 


Annex A Review of the Existing Regulatory Guidance on the Use ... 125 


— FDA-2019-D-2398 (CDER, CBER). Population Pharmacokinetics Guidance for 
Industry. 

— EMA/CHMP/EWP/185990/06. Committee for Medicinal Products for Human Use 
(CHMP). Guideline on reporting the results of population pharmacokinetic analysis. 


FDA-2019-D-2398 (CDER, CBER). Population Pharmacokinetics Guidance for Industry. 
https://www.fda.gov/media/128793/download. 

This FDA guidance focus on how to conduct a Pop-PK analysis but contains limited 
information regarding models’ validation. The section on model validation states: “Model 
validation depends on the objective of the analysis and should follow a fit-for-purpose 
approach”. Most of the recommendation refer to the type of plots that could be used 
to present the validation results, such as goodness-of-fit (GOF) plots, dependent variable 
versus the individual predictions plots, etc. 


EMA/CHMP/EWP/185990/06. Committee for Medicinal Products for Human Use (CHMP). 
Guideline on reporting the results of population pharmacokinetic analysis. 
https://www.ema.europa.eu/en/reporting-results-population-pharmacokinetic-analyses#cur 
rent-effective-version-section 

Population pharmacokinetics (Pop-PK) is the study of variability in drug concentra- 
tions between individuals (healthy volunteers or patients). It comprises the assessment of 
variability within the population, associated with patient characteristics such as age, renal 
function, or disease state. The non-linear mixed effects modelling approach has become 
increasingly used for Pop-PK. The EMA “Guideline on reporting the results of population 
pharmacokinetic analyses” assume such approach is used. In contrast to the FDA guid- 
ance on Pop-PK analyses, this guideline does not provide guidance on how to conduct 
a Pop-PK analysis, but rather provides guidance on points to consider when develop the 
analysis plan and the final analysis report. 

The analysis plan should at least include: 


— the objective(s) of the analysis 

— a brief description of the study (or studies) from which the data originates 

— the nature of the data to be analysed (how many subjects, rich or sparsely sampled) 

— the procedures for handling missing data and outlying data 

— the general modelling aspects (e.g., software, estimation methods, diagnostics) 

— the overall modelling procedure/strategy 

— the structural models to be tested (if this has been decided) 

— the variability models to be tested 

— the covariates and covariate models to be tested together with a rationale for testing 
these covariates based on, for example, biological, pharmacological and/or clinical 
plausibility. 

— the algorithms/methods to be used for covariate model building 
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— the criteria to be used for the selection of models during model building and the inclu- 
sion of covariates (e.g., objective function value, level of statistical significance, the 
goodness of fit plots, standard error, inter-individual variability, clinical relevance) 

— The model evaluation/qualification procedures to be used. 


The final report should include the following sections: 


— Summary 
— Introduction 
— Objectives 
— Data 

— Methods 

— Results 

— Discussion 


Drugs—dose-response models 

Whereas pharmacokinetics models predict how of the drug reaches the target, pharma- 
codynamics models predict the effect that the drug will produce on the target biological 
system. An important category of pharmacodynamics models are dose-response models 
(also known as exposure-response models). 

The concept of exposure and response are not always unequivocally defined. The broad 
term exposure is used to refer to dose (amount of a drug enters into the body) as well as to 
various measures of acute or integrated drug concentrations in plasma and other biological 
fluids. Similarly, response refers to a direct measure of the pharmacologic effect of the 
drug. Response measures include a broad range of endpoints or biomarkers. 

EMA has only guidelines for specific products (e.g., EMA/CHMP/594085/2015, which 
targets microbials), whereas FDA has a general guidance. 


FDA (CDER, CBER) 2003. Guidance for Industry: Exposure-Response Relationships— 
Study Design, Data Analysis, and Regulatory Applications. 
https://www.fda.gov/media/71277/download. 


Points to Consider in Study Design of Exposure-Response Analysis: 


— Crossover, fixed dose, dose response 

— For immediate, acute, reversible responses 

— Provide both population mean and individual exposure-response information 

— Safety information obscured by time effects, tolerance, etc. 

— Treatment by period interactions and carryover effects are possible; dropouts are 
difficult to deal with 

— Changes in baseline-comparability between periods can be a problem 
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— Parallel, fixed dose, dose response 

— For long-term, chronic responses, or responses that are not quickly reversible 

— Provides only population mean, no individual dose response 

— Should have a relatively large number of subjects (one dose per patient) 

— Gives good information on safety 

— Titration 

— Provide population mean and individual exposure-response curves, if appropriately 
analysed 

— Confounds time and dose effects, a particular problem for safety assessment 

— Concentration-controlled, fixed dose, parallel, or crossover 

— Directly provides group concentration-response curves (and individual curves, if 
crossover) and handles inter-subject variability in pharmacokinetics at the study design 
level rather than data analysis level 

— Requires real-time assay availability 


In the process of PK-PD modelling, it is important to describe the following prospectively: 


— Statement of the Problem 

— Statement of Assumptions 

— Selection of the Model 

— Validation of the Model 

— There are also recommendations on the structure of the reporting: 

— The response variable and all covariate information 

— An explanation of how they were obtained 

— A description of the sampling design used to collect the PK and PD measures 

— A description of the covariates, including their distributions and, where appropriate, 
the accuracy and precision with which the responses were measured 

— Data quality control and editing procedures 

— A detailed description of the criteria and procedures for model building and reduction, 
including exploratory data analysis. 


Drugs—Extrapolation models 
EMA/189724/2018. Reflection paper on the use of extrapolation in the development of 
medicines for paediatrics. 
https://www.ema.europa.eu/en/extrapolation-efficacy-safety-paediatric-medicine-dev 
elopment#current-version-section 
Extrapolation is defined as ‘extending information and conclusions available from stud- 
ies in one or more subgroups of the patient population (source population(s)), or in related 
conditions or with related medicinal products, in order to make inferences for another 
subgroup of the population (target population), or condition or product, thus reducing the 
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amount of, or general need for, additional evidence generation (types of studies, design 
modifications, number of patients required) needed to reach conclusions.' While the focus 
of discussion here is on extrapolation for the development of medicines in children, the 
underlying principles may be extended to other areas. 


Extrapolation Concept 


Existing information about the disease, the drug pharmacology and the clinical response 
to treatment should be collated across studies and target populations. Factors that might 
impact the effects of treatment from different studies and target populations should be 
identified. 

The primary focus will usually be to establish a line of reasoning about the relation- 
ship between exposure and clinical responses. Where data are available to establish that 
a relationship (e.g., exposure-response) in the target population is similar to the study 
population the knowledge gained from the study population can be incorporated into the 
extrapolation concept and will not need to be addressed in the extrapolation plan. 

For other relationships or factors, reliable and informative data might not be available. 
These gaps in knowledge give rise to assumptions in the extrapolation concept that need 
to be investigated in the extrapolation plan before the extrapolated effects of treatment in 
the target population can be considered as a sound basis for regulatory decision-making. 

Where possible, quantitative methods should be used for the collation of available 
data and the investigation of potential modifiers of the treatment effect. A structured 
extrapolation plan should be provided. 


Extrapolation Plan 


The gaps in knowledge and the assumptions identified in the extrapolation concept deter- 
mine the objectives(s) and methodological approaches for the tests and trials that need to 
be conducted to draw inferences that are relevant for the target population. These tests 
and trials should be conducted to generate evidence that strengthens and ultimately, based 
on success criteria, confirms the extrapolation concept. Specifically, the extrapolation plan 
will address whether regulatory decisions can rely on the initial, or revised, expectations 
on the effects of treatment in the target population, or if more data need to be generated. 

Extrapolation plans will differ according to the extent of assumptions in the extrapola- 
tion concept. Data in the study population might establish that there are so few important 
modifiers of the treatment effect that clinical outcome can be predicted through similarity 
in drug exposure or in the magnitude of PD response. Alternatively, data from the source 
population might be limited such that the influence of one or more factors needs to be 
investigated through generation of some additional clinical data from the target population. 
The extreme case would be where gaps in knowledge might be such that extrapolation is 
not a viable approach. 


Annex A Review of the Existing Regulatory Guidance on the Use ... 129 


Mitigation of Uncertainty 
Whilst conclusions from an extrapolation approach can give a sound basis for regula- 


tory decision making, the data generated may not be sufficient to address all uncertainties 
related to a specific research question for the target population. For example, an acceptable 
degree of patient benefit on short-term efficacy outcomes, sufficient to support authorisa- 
tion, might be established based on an extrapolation approach, but quantification of how 
this effect translates into longer-term outcomes might not be available. When there is a 
well-reasoned scientific uncertainty to be addressed to enhance the understanding of the 
effect of treatment with implications for better labelling and better use in clinical practice, 
the extrapolation plan can continue post-authorisation to reduce the identified uncertainty. 


Drugs—PBPK 
As mentioned above, whereas pharmacokinetics models predict how of the drug reaches 


the target, pharmacodynamics models predict the effect that the drug will produce on 
the target biological system. While PK models were historically developed without any 
mechanistic assumptions, by simply fitting experimental data with statistical models, 
Physiologically Based Pharmaco-Kinetics (PBPK) models predict the absorption, distri- 
bution, metabolism and excretion of a drug by relying on the mechanistic knowledge that 
anatomy, physiology, physics and chemistry can provide. In most cased PBPK models are 
so-called grey-box models, in the sense they are built combining mechanistic and empirical 
(e.g., data-driven) knowledge. 


EMA/CHMP/458101/2016. Guideline on the reporting of physiologically based pharma- 
cokinetic (PBPK) modelling and simulation. 
https://www.ema.europa.eu/en/reporting-physiologically-based-pharmacokinetic-pbpk- 
modelling-simulation#current-effective-version-section. 

The guideline recommends the essential information that needs to be reported when 
reporting PBPK modelling and simulation studies: 


— Objective and regulatory purpose 

— Background information 

— Qualification 

— Model parameters 

— Assumptions 

— System-dependent parameters 

— Drug parameters and the drug model 
— Model development 

— Simulation of the intended scenario 
— Platform and drug model evaluation 
— Sensitivity analyses 

— Evaluation of the predictive performance of the drug model 
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— Results 
— Discussion of the regulatory application 


In particular, with respect to model evaluation, the EMA wrote: “A comprehensive sum- 
mary of the system and drug model evaluation should be provided. A thorough evaluation 
of the drug model is important if the model is to be used to simulate novel situations, 
e.g., drug interaction or PK in a different population. An evaluation of the model should 
be presented in appendix with sufficient detail in the report to support confidence for 
regulators in the application of the model in their decision-making. 

The appendix should provide some additional recommendations: 


— The validation should include the investigational drug PBPK model (treatment model). 
— Validation studies should be done against human experiments with multiple doses. 
— Simulation should be performed on populations of interest of at least 100 subjects. 


The guidance suggests the types of plots to be used to compare predictions to experiments. 

The acceptance criteria (adequacy of prediction) for the closeness of the comparison of 
simulated and observed data depends on the regulatory impact and needs to be considered 
separately for each application. 


FDA/CDER/2018. Physiologically Based Pharmacokinetic Analyses—Format and Content 


Guid Industry. 
EE eo E E MTT A PA 


The FDA guidance covers: 


— Overview of Modeling Strategy 

— Modeling Parameters 

— Simulation Design 

— Electronic Files and Other Documentation 
— Software 

— Model Verification and Modification 

— Model Application 


The introduction section should provide: 


1. a high-level synopsis of the drug’s physicochemical, PK, and PD properties; 

2. the exposure-response relationships for the efficacy and safety of the drug to the extent 
that they are known; 

3. a brief PBPK-related regulatory history (i.e., prior interactions with the FDA and other 
regulatory agencies) to provide context for the PBPK analyses; 
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4. cross-referencing to PBPK study reports previously submitted to the FDA for different 
intended uses at different stages of the development of the same drug substance or the 
same drug product. 


The Materials and Methods section should include "sufficient information to allow FDA 
reviewers to duplicate and evaluate the submitted modelling and simulation results 
and to conduct supplemental analyses when necessary." 

Electronic files related to modelling software and simulations should be submitted 
along with the PBPK study report. 


FDA/CDER draft guidance October 2020. The Use of Physiologically Based Pharma- 
cokinetic Analyses —Biopharmaceutics Applications for Oral Drug Product Development, 
Manufacturing Changes, and Controls—Guidance for Industry. 
https://www.fda.gov/media/142500/download 

This guideline covers the concept of quality by design (QbD) principles and propose 
that the application of PBPK modelling could be expanded to pharmaceutical drug product 
development, manufacturing changes, and controls. It is applicable to oral formulations, 
only. 

In addition to the general considerations (which follow a similar structure as in the pre- 
viously described guideline), specific applications of PBPK modelling to support product 
quality are described: 


1. Development of Clinically Relevant Dissolution Specifications (Method and Accep- 
tance Criteria): 
a. Aid in Biopredictive Dissolution Method development 
b. Support Clinically Relevant Dissolution Acceptance Criteria 

2. Establishment of Clinically Relevant Drug Product Quality Specifications (Other Than 
Dissolution) 

3. Quality Risk Assessment for Pre- and Post-approval Changes and Risk-Based 
Biowaivers 


Drug—Chemicals—PBK 
OECD Guidance document on the characterization validation and reporting Physiologi- 
cally Based Kinetic (PBK) models for regulatory purposes. Adopted April 2021. https:// 
www.oecd.org/chemicalsafety/risk-assessment/guidance-document-on-the-characterisa 
tion-validation-and-reporting-of-physiologically-based-kinetic-models-for-regulatory-pur 
poses.pdf 

This OECD guidance supports the use of PBK models for chemical risk assessment as 
an alternative to animal testing. PBK models are the same of PBPK models; the change 
in terminology is due to the fact that OECD targets not only drugs development, but also 
the safety assessment of chemical products, so the terminology is more generic. 
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It describes the key steps for characterizing and validating such a model to improve 
model credibility and communication between modelers and regulators, but it does not 
provide good practices for model development. Interestingly, this guidance accounts for 
the fact that PBK models are most often calibrated with in vitro or in silico data, as in vivo 
kinetic data may not be available. However, this guidance explicitly states that goodness- 
of-fit and predictivity for PBK models requires in vivo kinetic data without stating their 
prospective or retrospective character. It also discusses the validation as a term that may 
be understood differently by model developers and regulators. 

First, a regular PBK modelling workflow is described. Notably, it includes a step called 
model performance, which covers model validation, sensitivity, variability and uncertainty 
analyses, and predictive capacity. 

The regulatory assessment part explains what is considered in the validation of a PBK 
model, taking into account the CoU. It provides two tools: (1) a model reporting template 
for model developers and (2) an evaluation checklist of model applicability for regulators. 

The recommendations from the template for reporting are the following: 


— Name of model 

— Model developer and contact details 

— Summary of model characterization, development, validation, and regulatory applica- 
bility 

— Model characterization (following the steps of the aforementioned modelling work- 
flow) 

— Identification of uncertainties 

— Model implementation details 

— Peer engagement (input/review) 

— Parameter tables 

— References and background information 


The checklist for regulators is split into a context/implementation section and an assess- 
ment of validity section. The later covers the biological basis of the model, the theoretical 
basis of model equations, reliability of input parameters, uncertainty & sensitivity analysis 
and goodness-of-fit & predictivity. 


CiPA: Comprehensive In Vitro Proarrhythmia Assay 

While the CiPA project has not produced guidelines yet, it is worth mentioning it here. The 
project, a collaboration between various regulatory agencies including FDA and EMA, 
aims to define a new approach to evaluate the risk that a new drug may cause torsades de 
pointes (TdP), an abnormal heart rhythm that can lead to sudden cardiac death, based on a 
suite of in vitro assays coupled to in silico models of cardiac electrophysiologic activity. 
The project will provide data to ICH to update the S7B and E14 guidances. The most 
recent is this below, where some guidelines for in silico models is provided. 
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ICH E14/S7B Implementation Working Group: Clinical and Nonclinical Evaluation of QT/ 
QTc Interval Prolongation and Proarrhythmic Potential Questions and Answers. Draft 
version, Endorsed 27 August 2020 
https://www.ema.europa.eu/en/documents/scientific-guideline/ich-guideline-e14/s7b-cli 
nical-nonclinical-evaluation-qt/qtc-interval-prolongation-proarrhythmic-potential-questi 
ons-answers-step-2b en.pdf 

“The following general principles should be applied to all proarrhythmia risk predic- 
tion models intended to be used as part of an integrated risk assessment for regulatory 
purposes. While the main focus of these principles is to evaluate a model's predictivity of 
TdP risk, they are general enough to guide the development of models predicting different 
types of proarrhythmia. 


1. A defined endpoint consistent with the context of use of the model. 

2. A defined scope and limitations of the model. This includes the experimental protocols 
to generate model input (experimental data capturing pharmacological effect of drug), 
and the compounds tested should have the same arrhythmic mechanisms covered by 
the model. 

3. A prespecified analysis plan and criteria to assess model predictivity. The analysis plan 
should include methods to separate the training and validation steps. In the training 
step, a series of reference compounds is used to adjust the model. In the validation 
step, another series of reference compounds is used to evaluate the performance of 
the pre-specified model. The reference compounds used for the training and validation 
steps should not overlap. 

4. A fully disclosed algorithm to translate experimental measurements (model input) to 
proarrhythmia risk (model output), allowing independent reproduction of the model 
development process using the associated training and validation datasets to re-evaluate 
the model performance. 

5. The uncertainty in the model inputs should be captured and propagated to the model 
predictions. The experimental variability associated with model input should be quan- 
tified using appropriate statistical methods and then translated into probabilities of the 
predicted risk. 

6. A mechanistic interpretation of the model, which describes the relationship between 
the model inputs and mechanism for the arrhythmia.” 


CM&S for Medical devices—alternative to animal testing for artificial pancreas 


Kovatchev BP, Breton MD, Dalla Man C, Cobelli C. In silico model and computer sim- 
ulation environment approximating the human glucose/insulin utilization. Food and Drug 
Administration Master File MAF 1521. 2008 
https://moodle.adaptland.it/pluginfile._php/20224/mod_data/content/42094/In-Silico%20A 
pplication %20-%20paper%201.pdf 
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In January 2008, a computer simulator of type 1 diabetes mellitus was accepted by the 
FDA Center for Devices and Radiological Health (CDRH) as a substitute for animal trials 
for the preclinical testing of control strategies in artificial pancreas studies.! Soon after, a 
first investigational device exemption was granted by the FDA for a closed-loop control 
clinical trial on the basis of results from this simulation tool. To the authors’ knowledge, 
this is the first instance of a regulatory decision on medical devices where a computer 
model prediction is accepted as replacement of an in vivo experiment. 


Medical devices—risk of mechanical failure 

ASTM F2514-06, Standard Guide for Finite Element Analysis (FEA) of Metallic Vascular 
Stents Subjected to Uniform Radial Loading, ASTM International, West Conshohocken, PA, 
2008, www.astm.org 

The American Society for Testing and Materials (ASTM) F2514-8 was the first tech- 
nical standard related to the use of a physics-based model to predict the performance of 
medical devices. According to ASTM, the purpose of the guide is to "establish recom- 
mendations and considerations for the development, verification, validation, and reporting 
of structural finite element models used in the evaluation of the performance of a metallic 
vascular stent design undergoing uniform radial loading. This standard guide does not 
directly apply to non-metallic or absorbable stents, though many aspects of it may be 
applicable. The purpose of a structural analysis of a stent is to determine quantities such 
as the displacements, stresses, and strains within a device resulting from external loading, 
such as crimping or during the catheter loading process, and in-vivo processes, such as 
expansion and pulsatile loading". 

Published in 2008, the standard establishes general requirements and considerations for 
using Finite Element Analysis techniques for the numerical simulation of metallic stents 
subjected to uniform radial loading. 

The basic idea was that for highly standardised experimental bench tests (for stents 
the ASTM F2477-07: Standard Test Methods for in vitro Pulsatile Durability Testing 
of Vascular Stents), a finite element model could reliably predict the outcome of the 
experiment, and thus be used to reduce and refine, and even in some low-risk cases, 
replace the bench experiment itself. 

While the experimental tests being supplemented were extremely simple, the adop- 
tion of this early standard introduced in the regulatory space the concept that a model 
prediction could be used in place of an experiment within the regulatory process. 


FDA-2020-D-0957 Non-Clinical Engineering Tests and Recommended Labeling for 
Intravascular Stents and Associated Delivery Systems—Guidance for Industry and FDA 


Staff 
https://www.fda.gov/media/7 1639/download 


! https://doi.org/10.1177/193229680900300106. 
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This is not a guideline specifically focused on computational methods, but it includes 
some non-binding recommendations for the use of modelling in the pre-clinical assess- 
ment of stents: 


— You should establish protocols for all experiments or computational analyses, including 
acceptance criteria when applicable, before you perform the tests 

— We recommend that you determine the stress- strain response, endurance limit, 
and post-processing mechanical properties through physical experiments or compu- 
tational models that simulate stent material properties, manufacturing, and deployment 
processes. 

— FDA recommends that you include the following elements in your stress/strain analysis 
and test report for each stent design. 
— Computational Model and Inputs 

— We recommend that you clearly identify and explain the sources and values of 
all inputs and assumptions used to create the stress/strain analysis model. You 
should identify any software used for analysis. We recommend that finite ele- 
ment analysis reports include the element types used to model the stent, loading 
surfaces, and boundary conditions. We also recommend that you indicate if mesh 
refinement analysis was performed and clearly describe how you model the sur- 
rounding vessel/tissue and the type of contact elements used. Specifically, we 
recommend that you consider the following: 

— Model Geometry 

— We recommend that you clearly describe the stent and vessel geometry used. If 
symmetry is used, we recommend that you explain why this is appropriate for 
your model. 

— If you do not model all of your stent sizes, we recommend that you explain why 
the modelled stent size is the worst case with respect to critical stresses. We 
recommend that you address the effect of dimensional variation within allow- 
able tolerances on the results of the stress/strain analysis (i.e., maximum critical 
stress). 

— We recommend that you provide a justification for the physiological relevance 
of your vessel model parameters (e.g., vessel compliance). 

— Type of Element & Mesh Refinement Analysis 

— We recommend that you specify the number and type of elements used in your 
mesh, including any mesh refinement in transition regions or regions of complex 
geometry. 

— We recommend that you perform a mesh refinement analysis to ensure that 
the solution is independent of element size. If you do not believe mesh refine- 
ment analysis is necessary for your model, we recommend that you provide a 
justification for not conducting such an analysis. 

— Contact Elements 
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— We recommend that you specify the type of contact defined between any 2 con- 
tacting bodies modelled in your analysis; e.g., the vessel and outer surface of 
your stent. 

— Material Properties (Constitutive Model) 

— We recommend that you clearly describe the material stress/strain behaviour of 
your stent in graphical and equation form. This discussion should include, but is 
not limited to the following considerations: 

— Linear versus non-linear 

— Isotropic versus anisotropic 

— Temperature-dependent behaviour of raw versus processed material. 
— Finite Element Analysis (FEA) Validation 

— We recommend that you validate your FEA (material properties, geometry, 
and boundary conditions) with experimental bench testing. For example, you 
could perform radial loading of your device and compare the force-displacement 
results with FEA of a simulated radial loading experiment. 


FDA-2019-D-1261. Technical Considerations for Non-Clinical Assessment of Medical 
Devices Containing Nitinol 
https://www.fda.gov/media/123272/download 

This guidance also includes some recommendations for the use of computational 
models: 


— 2. Computational Stress/Strain Analyses 
— If you plan to conduct computational analyses, we recommend the following to 
ensure the unique thermomechanical properties of nitinol are properly captured: 

— a. The constitutive laws applicable to nitinol can differ substantially from tradi- 
tional metals. Therefore, you should simulate nitinol material with an appropriate 
material model. You should document and justify the parameters used in the 
material model. 

— b. Material model parameters can be obtained from ASTM F2516 “Standard 
Test Method for Tension Testing of Nickel-Titanium Superelastic Materials.” 
Test specimens should be representative of the final manufactured device (e.g., 
including heat treatment and surface processing steps). Testing should be con- 
ducted at a temperature representative of the clinical use environment (e.g., 37 °C 
for implantable devices). 

— c. Your computational analysis should include the effect of any shape setting 
steps in your manufacturing process since these will relieve pre-existing stresses. 

— d. If your device is subjected to cyclic loading during use, we recommend that 
you calculate fatigue safety factor(s) using a constant life curve. Unlike tradi- 
tional metals, which utilize stress-based fatigue life estimates (e.g., Goodman, 
Soderberg diagrams), using a constant life mean versus alternating strain diagram 
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has been found to provide a good model for fatigue life prediction for nitinol.34 
Fatigue life of nitinol is sensitive to composition and processing. Therefore, we 
recommend that you generate a constant life curve specific to your device by 
experimental testing of nitinol samples that are representative of your final manu- 
factured device (e.g., including heat treatment and surface processing) rather than 
leveraging data not specific to your device. Since fatigue life can be adversely or 
favourably affected by pre-strain (e.g., from crimping of a stent onto a delivery 
catheter), we recommend you consider and discuss the effects of pre-strain. We 
recommend that you state and justify the method used to calculate mean and 
alternating strain for fatigue safety factors (e.g., scalar or tensor). 

— e. You should validate the computational model used to analyse the nitinol 
device, and justify the validation activity relative to the context of use (CoU) 
of the computational model, the risk and role of the computational model in 
decision-making, and the range of conditions assessed relative to those in the 
CoU. We also recommend that you justify your choice of the parameter measured 
(e.g., force, strain) and loading path in your validation activities. 

— We recommend that the submission of computational stress/strain analysis reports 
follow the "Reporting of Computational Modeling Studies in Medical Device 

Submissions Guidance". 


ASTM F2996-13, Standard Practice for Finite Element Analysis (FEA) of Non-Modular 
Metallic Orthopaedic Hip Femoral Stems, ASTM International, West Conshohocken, PA, 
2013, www.astm.org 

The other widely used bench test for medical devices was the fatigue testing of hip 
stems. The late 1970s saw a number of fatigue fractures for various hip stems, which 
drove the ISO to the development of a technical standard for the execution of bench tests 
in the late 80 s (ISO 7206-3:1988). Every regulatory agency quickly required these bench 
tests to provide the marketing authorisation for new hip stem designs. The 2013 ASTM 
F2996 standard provides a computational complement to the ISO7206 fatigue tests. 


ASTM F3161-16, Standard Test Method for Finite Element Analysis (FEA) of Metal- 
lic Orthopaedic Total Knee Femoral Components under Closing Conditions, ASTM 
International, West Conshohocken, PA, 2016, www.astm.org 


ASTM F3334-19, Standard Practice for Finite Element Analysis (FEA) of Metallic 
Orthopaedic Total Knee Tibial Components, ASTM International, West Conshohocken, PA, 
2019, www.astm.org 

Extension to two popular bench tests for orthopaedic implants: the fatigue testing of 
the femoral component and of the tibial component of knee replacements. 
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ASTM WK64097. New Practice for Spinal Fusion Cage Computational Modeling 

This ASTM work item (standard under development) should provide the computational 
counterpart to the “ASTM F2077-18: Test Methods For Intervertebral Body Fusion 
Devices” experimental protocol. 


Medical devices—Guidance on reporting modelling studies 


FDA-2013-D-1530. Reporting of Computational Modeling Studies in Medical Device 
Submissions. Guidance for Industry and Food and Drug Administration Staff. 

In 2013 the FDA CDRH started to work on a guidance document on how to report the 
results of computational modelling studies in medical device regulatory submissions. The 
first draft was published in 2014, with the final version issued in 2016. Already in 2017, 
220 of the 1500 new medical devices submitted to the FDA included computer models 
and simulations as evidence in their regulatory submissions (Source: AABME). 

According to this guidance such report should include: 


— Executive Report Summary 
— Background/Introduction 
— Code Verification 

— System Configuration 

— System Properties 

— System Conditions 

— System Discretization 

— Numerical Implementation 
— Validation 

— Results 

— Limitations 


The document also provides in annex more detailed instructions for specific type of 
models: 


— Computational Fluid Dynamics and Mass Transport 
— Computational Solid Mechanics 

— Computational Electromagnetics and Optics 

— Computational Heat Transfer 
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Computational Ultrasound 


FDA-2021-D-0980. Assessing the Credibility of Computational Modeling and Simulation 
in Medical Device Submissions. 
https://www.fda.gov/media/154985/download 

This is to date (July 2022) the most complete guidance available on this topic; the 
linked version is the final draft of the revision published in December 2021, which will 
replace the 2016 version. The guidance provides a generalized framework for assessing 
credibility of computational modelling in a regulatory submission. Since the document is 
entirely relevant trying to make a summary here would be difficult. Instead, we provide 
an extract of the index, for the core chapters: 


A. Preliminary steps 
Question of Interest 
Context of use (CoU) 
Model risk 

B. Credibility Evidence 
Code verification results 
Model calibration evidence 
General non-CoU evidence 
Evidence generated using bench-top conditions to support the current CoU 
Evidence generated using in vivo conditions to support the current CoU 
Evidence generated using bench-top conditions to support a different CoU 
Evidence generated using in vivo conditions to support a different CoU 
Population-based evidence 
Emergent model behavior 
Model plausibility 

C. Credibility Factors and Credibility Goals 

D. Adequacy Assessment 


Medical devices—Credibility assessment 
ASME VV40:2018. Assessing Credibility of Computational Modeling through Verification 
and Validation: Application to Medical Devices 

In 2018 was published the official version of the first technical standard that specifies a 
systematic approach to the credibility assessment for computational models, which results 
inform a regulatory submission on medical devices. ASME reports that the standard “de- 
termine and justify the appropriate level of credibility for using a computational model 
to inform a decision"; thus, its scope is not limited to the regulatory purpose. Since this 
standard is one of the centrepieces of this position paper, we provide here only a summary. 
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The standard introduces a general principle: the credibility of a computational model 
should be commensurate with the risk associated in using the model to influence a deci- 
sion. The concept of risk-based acceptability is very general and has been rapidly endorsed 
also for other medical products, as evidenced by a 2019 paper providing a concrete exam- 
ple for a medical device,” a 2020 paper proposing its use also in drug development for 
PBPK models, and a concept reiterated in a 2021 paper where the proposed use is 
extended to any mechanistic model used in drug development.^ 
FDA-2003. Knee Joint Patellofemorotibial and Femorotibial Metal/Polymer Porous-Coated 
Uncemented Prostheses—Class II Special Controls Guidance Document for Industry and 
FDA 


https://www.fda.gov/medical-devices/guidance-documents-medical-devices-and-radiat 
ion-emitting-products/knee-joint-patellofemorotibial-and-femorotibial-metalpolymer-por 
ous-coated-uncemented-prostheses 

The guidance mentions the use of finite element analysis to support regulatory submis- 
sions: “...Alternatively, finite element analysis (FEA) or other calculations with validation 
of the model and assumed values may be appropriate". 


Medical devices: EU side 
EU-MDR-2017/745: Medical Device Regulation. 

This new legislation defines the rules concerning the placing on the market, mak- 
ing available on the market, or putting into service of medical devices for human use 
and accessories for such devices in the EU. While the legislation does not include spe- 
cific requirements for the use of computer models and simulation for the development of 
medical devices, it explicitly acknowledges their use: 


— "where appropriate, the results of biophysical or modelling research the validity of 
which has been demonstrated beforehand" (Annex I, 10.1a)) 

— "the pre-clinical testing, for example laboratory testing, simulated use testing, computer 
modelling, the use of animal models" (Annex VII, 4.5.4.a)). 


Japanese guidelines 


Ministry of Economics, Industries, and Japan Agency for Medical Research and Develop- 
ment. Guidelines for Developing in silico evaluation. March 2019. 

We were able to access an English translation of this document, dated 2019, that targets 
specifically In Silico Clinical Trials for medical devices. The guideline suggests for the 
model evaluation: 


2 https://doi.org/10.1097/mat.0000000000000996. 
3 https://doi.org/10.1002/psp4.12479. 
^ https://doi.org/10.1002/psp4.12669. 
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1. Construct a scenario based on a mathematical model (i.e., an expectation based on an 
experiment that such a phenomenon will be observed if the model is correct). 
2. Run all or part of the scenario as an experiment. 
Consider whether the experimental results support the scenario. 
4. Consider whether the numerical model is reasonable in light of established theories 
and techniques. 
(a) Consider whether the numerical model (experimental system) including numerical 
methods adequately treats the system subject to the mathematical model. 
5. We will discuss the hypotheses that mathematical models entail and the conditions 
under which they are valid. 
(a) Consider whether their logic is interconnected and whether there are any holes. 
(b) When multiple hypotheses are included, it may not be possible to isolate them 
depending on the conditions of the experiment. 


2» 


The document also contemplates a case they call V&V of Unknown Provenance (VOUP), 
in which methods and others are known and commonly available, but adequate recordings 
of “who did the experiment" and “how” are unavailable. This resembles the concept 
Software of Unknown Provenance in International Electrotechnical Commission (IEC) 
62304. They make a parallel with experimental methods, where some "good practice" 
must be adopted for the experiment to be valid, but it is not always possible to trace back 
when such good practice was adopted and validated. Similarly, in silico models might use 
practices that are known to be valid, but that cannot be documented back to the original 
developer. When this is the case, the guideline recommends that “the degree of VOUP 
should be clarified, and the in silico evaluation should proceed based on the items that 
can be accepted although the details are unknown." 
The guideline discusses some essential steps: 


1. Determination of the subject. This is what we call Context of Use. 
2. Setting targets for achieving the task. Objectives set based on available knowledge 
and standards. 
3. List of components to be calculated. 
4. Description of the physical phenomena of numerical calculations and mathematical 
models representing them 
5. Various parameter settings in the mathematical model. These include: 
a. The shape and dimension of the object to be numerically calculated. 
b. Boundary conditions for numerical calculations 
c. Initial conditions for numerical calculation 
d. External input to the object required to perform numerical calculations (e.g., 
energy, load, force, etc.). 
e. Characteristic values that appear in mathematical models and are essential for 
numerical computation. 


SO oris 


10. 
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f. Unit system of numerical calculation used for shape, dimension, characteristic 
value, external input, etc., and check the consistency of units. 

Description of the numerical method 

Description of numerical results 

Confirmation of conformity with numerical calculations 

Validation of numerical calculations 

Validation of in silico assessment 


Chinese guidelines 


We were able to access an English translation of this document, dated 2020, that targets 
model informed drug development. The guideline suggests the following points to be 
addressed during the "Implementation of model analysis". 


On is Oui Ens 


Quality control 

Model assumption 

Model verification 
Model-based analysis plan 
Model-based analysis report 


Of which 1—3 could be understood of dealing with the evaluation of models. 


Glossary 


AFAP As Far As Possible 

ALARP As Low As Reasonably Practicable 

API Application Programming Interface 

ASME American Society of Mechanical Engineers 

ATMP Advanced Therapeutic Medicinal Products 

CAPA Corrective Action and Preventive Action 

CM&S Computational Modelling and Simulation 

COU Context of Use 

CRO Contract Research Organisation 

DMO Digital Mobility Outcome 

DPIA Data Protection Impact Assessment 

EAA Early Awareness and Alert 

EMA European Medicines Agency 

FDA United States of America Food and Drug Administration 

FFR Fractional Flow Reserve 

GCP Good Clinical Practice 

GDPR General Data Protection Regulation 

GLP Good Laboratory Practice 

GMP Good Manufacturing Practice 

GSP Good Simulation Practice 

HIPAA Health Insurance Portability and Accountability Act 

HTA Health Technology Assessment 

ICH International Council for Harmonisation of Technical Requirements for 
Pharmaceuticals for Human Use 

IEC International Electrotechnical Commission 

IEEE Institute of Electrical and Electronics Engineers 

IMDRF International Medical Device Regulators Forum 

ISO International Organization for Standardization 

ISW CoP In Silico World Community of Practice 

MDR Medical Device Regulation [Regulation (EU) 2017/745] 

© The Editor(s) (if applicable) and The Author(s) 2024 143 


M. Viceconti and L. Emili (eds.), Toward Good Simulation Practice, Synthesis Lectures 
on Biomedical Engineering, https://doi.org/10.1007/978-3-03 1-48284-7 
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MDRO Multi-Drug Resistant Organism 

MID Minimal Important Difference 

MMS Method of Manufactured Solutions 

NASA National Aeronautics and Space Administration 
NGS Next Generation Sequencing 

OECD Organisation for Economic Co-operation and Development 
PCP Pre-Commercial Procurement 

PMA Pre-Market Approval 

PPI Public Procurement of Innovative Solutions 
QOI Quantity of Interest 

QSAR Quantitative Structure-Activity Relationship 
RCT Randomised Controlled Trial 

RTM Requirements Traceability Matrix 

RTM Requirements Traceability Matrix 

SaMD Software as a Medical Device 

SDP Software Development Plan 

SDP Software Development Plan 

SLC Software Life Cycle 

SOP Standard Operating Procedure 

SQA Software Quality Assurance 

SRD System Requirements Document 

SSED Summary of Safety and Effectiveness Data 
UML Unified Modelling Language 

VPH Virtual Physiological Human 


VV-40:2018 ASME standard “Assessing Credibility of Computational Modeling 
through Verification and Validation: Application to Medical Devices” 
VVUQ Verification, Validation and Uncertainty Quantification 


